Cloud Run¶
Cloud Run is the recommended deployment target when you need full control over the HTTP server, long-running requests, custom middleware, static file serving, or SSE streaming. Unlike Cloud Functions, you provide a complete HTTP server (using Node.js http, Express, Fastify, or any framework) and package it as a Docker container.
Architecture¶
With Cloud Run, you do not use the gcpHttp() adapter. Instead, you wire the framework's handler factories directly into your HTTP server:
graph LR
subgraph "Cloud Run Container"
Server["HTTP Server<br/>(node:http / Express)"]
Execute["createAgentExecuteHandler"]
Stream["createAgentExecuteStreamHandler"]
Static["Static File Serving<br/>(React UI)"]
end
Client["Browser / API Client"] -- "POST /api/agents/:id/execute" --> Server
Client -- "POST /api/agents/:id/execute/stream" --> Server
Client -- "GET /*" --> Server
Server --> Execute
Server --> Stream
Server --> Static HTTP Server Example¶
The following example uses Node.js http.createServer directly (no Express dependency). This pattern is used in production by the framework's demo application.
import http from "node:http";
import path from "node:path";
import fs from "node:fs";
import {
GeminiClient,
ToolRegistry,
PromptTemplate,
createAgentExecuteHandler,
createAgentExecuteStreamHandler,
} from "@modernpath/agent-framework";
import { MyAgent } from "./agents/MyAgent";
// ── Build agent (lazy singleton) ──
let built: { agent: MyAgent } | null = null;
async function getBuilt() {
if (built) return built;
const gemini = new GeminiClient({
apiKey: process.env.GOOGLE_AI_STUDIO_KEY!,
model: process.env.GEMINI_MODEL || "gemini-3-flash-preview",
});
const tools = new ToolRegistry();
const agent = new MyAgent(tools, { gemini });
built = { agent };
return built;
}
// ── Handlers ──
function resolveAgent(agentType: string) {
// Map agent type strings to agent instances
if (agentType === "my-agent") return getBuilt().then((b) => b.agent);
throw new Error(`Unknown agentType: ${agentType}`);
}
// ── HTTP Server ──
const port = parseInt(process.env.PORT || "8080", 10);
const server = http.createServer(async (req, res) => {
const url = new URL(req.url || "/", `http://${req.headers.host || "localhost"}`);
const method = (req.method || "GET").toUpperCase();
// CORS
const cors = {
"access-control-allow-origin": "*",
"access-control-allow-headers": "content-type, authorization",
"access-control-allow-methods": "GET, POST, OPTIONS",
"access-control-max-age": "600",
};
if (method === "OPTIONS") {
for (const [k, v] of Object.entries(cors)) res.setHeader(k, v);
res.statusCode = 204;
res.end();
return;
}
// Health check
if (url.pathname === "/health") {
res.statusCode = 200;
res.setHeader("content-type", "application/json");
res.end(JSON.stringify({ ok: true }));
return;
}
// Agent execute (JSON)
const execMatch = url.pathname.match(
/^\/api\/agents\/([^/]+)\/execute$/,
);
if (execMatch && method === "POST") {
const { agent } = await getBuilt();
for (const [k, v] of Object.entries(cors)) res.setHeader(k, v);
const body = await readBody(req);
const handler = createAgentExecuteHandler({
resolveAgent: () => agent,
getUserId: () => 1,
cors: { origin: "*" },
});
const event = {
method: req.method,
headers: req.headers as Record<string, string>,
body,
pathParameters: { auditingId: execMatch[1] },
};
const out = await handler(event);
res.statusCode = out.statusCode;
for (const [k, v] of Object.entries(out.headers || {})) res.setHeader(k, v);
res.end(out.body);
return;
}
// Agent execute (SSE stream)
const streamMatch = url.pathname.match(
/^\/api\/agents\/([^/]+)\/execute\/stream$/,
);
if (streamMatch && method === "POST") {
const { agent } = await getBuilt();
for (const [k, v] of Object.entries(cors)) res.setHeader(k, v);
const body = await readBody(req);
const handler = createAgentExecuteStreamHandler({
resolveAgent: () => agent,
getUserId: () => 1,
cors: { origin: "*" },
});
const event = {
method: req.method,
headers: req.headers as Record<string, string>,
body,
pathParameters: { auditingId: streamMatch[1] },
};
const out = await handler(event);
res.statusCode = out.statusCode;
for (const [k, v] of Object.entries(out.headers || {})) res.setHeader(k, v);
res.flushHeaders?.();
for await (const chunk of out.stream) {
res.write(chunk);
}
res.end();
return;
}
// 404
res.statusCode = 404;
res.setHeader("content-type", "application/json");
res.end(JSON.stringify({ error: "Not found" }));
});
function readBody(req: http.IncomingMessage): Promise<string> {
return new Promise((resolve, reject) => {
let data = "";
req.on("data", (chunk) => (data += chunk));
req.on("end", () => resolve(data));
req.on("error", reject);
});
}
server.listen(port, "0.0.0.0", () => {
console.log(`Listening on http://0.0.0.0:${port}`);
});
Health Check Endpoint¶
Cloud Run sends periodic health probes to determine instance readiness. Always expose a /health endpoint:
if (url.pathname === "/health") {
res.statusCode = 200;
res.setHeader("content-type", "application/json");
res.end(JSON.stringify({ ok: true }));
return;
}
Configure the health check in your Cloud Run deployment:
gcloud run deploy my-agent \
--startup-probe-path /health \
--startup-probe-period 10 \
--liveness-probe-path /health \
--liveness-probe-period 30
Bundling with esbuild¶
Bundle the server into a single CJS file for the Docker image:
{
"scripts": {
"build:cloudrun": "rm -rf dist-cloudrun && mkdir -p dist-cloudrun && esbuild src/server.ts --bundle --platform=node --target=node20 --format=cjs --outfile=dist-cloudrun/server.cjs --external:@google-cloud/firestore && cp -R prompts dist-cloudrun/prompts && cp package.cloudrun.json dist-cloudrun/package.json"
},
"devDependencies": {
"esbuild": "^0.25.0"
}
}
Create a minimal package.json for the container:
{
"name": "my-agent-cloudrun",
"version": "1.0.0",
"dependencies": {
"@google-cloud/firestore": "^7.0.0"
}
}
Dockerfile¶
FROM node:20-slim
WORKDIR /app
ENV NODE_ENV=production
COPY . .
RUN npm install --omit=dev
ENV PORT=8080
EXPOSE 8080
CMD ["node", "server.cjs"]
Multi-stage builds
For production, consider a multi-stage build to reduce image size:
# Build stage
FROM node:20 AS builder
WORKDIR /build
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build:cloudrun
# Runtime stage
FROM node:20-slim
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /build/dist-cloudrun .
RUN npm install --omit=dev
ENV PORT=8080
EXPOSE 8080
CMD ["node", "server.cjs"]
Deploy Script¶
The following script builds, pushes, and deploys to Cloud Run. It mirrors the pattern used in production deployments.
#!/usr/bin/env bash
set -euo pipefail
# ── Configuration ──
PROJECT_ID="${PROJECT_ID:?PROJECT_ID is required}"
REGION="${REGION:-europe-north1}"
SERVICE_NAME="${SERVICE_NAME:-my-agent}"
SERVICE_ACCOUNT="${SERVICE_ACCOUNT:?SERVICE_ACCOUNT is required}"
DOCKER_REPO="${DOCKER_REPO:-${REGION}-docker.pkg.dev/${PROJECT_ID}/cloud-run-source-deploy}"
# Secret IDs (from Secret Manager)
GA_SECRET="${GA_SECRET:-my-google-ai-studio-key}"
# Environment variables
GEMINI_MODEL="${GEMINI_MODEL:-gemini-3-flash-preview}"
echo "=== Cloud Run Deploy ==="
echo "Project: ${PROJECT_ID}"
echo "Region: ${REGION}"
echo "Service: ${SERVICE_NAME}"
# ── Step 1: Build ──
npm run build:cloudrun
# ── Step 2: Build Docker image ──
DIST_DIR="dist-cloudrun"
IMAGE_TAG="${DOCKER_REPO}/${SERVICE_NAME}:$(git rev-parse --short HEAD)"
docker buildx build \
--platform linux/amd64 \
-f Dockerfile.cloudrun \
-t "${IMAGE_TAG}" \
--load \
"${DIST_DIR}"
# ── Step 3: Push to Artifact Registry ──
docker push "${IMAGE_TAG}"
# ── Step 4: Deploy to Cloud Run ──
gcloud run deploy "${SERVICE_NAME}" \
--project "${PROJECT_ID}" \
--region "${REGION}" \
--image "${IMAGE_TAG}" \
--service-account "${SERVICE_ACCOUNT}" \
--set-env-vars "GCP_PROJECT_ID=${PROJECT_ID},GEMINI_MODEL=${GEMINI_MODEL}" \
--set-secrets "GOOGLE_AI_STUDIO_KEY=${GA_SECRET}:latest" \
--allow-unauthenticated \
--memory 512Mi \
--cpu 1 \
--timeout 300 \
--max-instances 3 \
--min-instances 0 \
--port 8080
echo "=== Deployment complete ==="
gcloud run services describe "${SERVICE_NAME}" \
--project "${PROJECT_ID}" \
--region "${REGION}" \
--format "value(status.url)"
Resource Configuration¶
| Parameter | Recommended | Notes |
|---|---|---|
--memory | 512Mi -- 1Gi | Increase if your agent loads large prompt templates or processes documents. |
--cpu | 1 | Sufficient for most agent workloads. Increase for parallel tool execution. |
--timeout | 300 | Maximum seconds per request. Agent executions with RAG + tools typically take 10--60 seconds. |
--min-instances | 0 | Set to 1 to eliminate cold starts (incurs idle costs). |
--max-instances | 3 -- 10 | Adjust based on expected concurrency. |
--port | 8080 | Must match the PORT environment variable and EXPOSE in the Dockerfile. |
--concurrency | 80 (default) | Number of concurrent requests per instance. Reduce if your agent is CPU-intensive. |
Serving Static UI Assets¶
A key advantage of Cloud Run is the ability to serve both the API and a React UI from the same container. Build the UI, copy it into the container, and serve it as static files with SPA fallback:
// After all API routes, serve static files
if (!url.pathname.startsWith("/api/") && url.pathname !== "/health") {
const filePath = path.join(uiDistDir, url.pathname);
if (fs.existsSync(filePath) && fs.statSync(filePath).isFile()) {
// Serve the file
res.statusCode = 200;
res.setHeader("content-type", getContentType(filePath));
fs.createReadStream(filePath).pipe(res);
return;
}
// SPA fallback -- serve index.html for all unmatched routes
const indexPath = path.join(uiDistDir, "index.html");
if (fs.existsSync(indexPath)) {
res.statusCode = 200;
res.setHeader("content-type", "text/html; charset=utf-8");
fs.createReadStream(indexPath).pipe(res);
return;
}
}
Include the UI build in your deploy script:
# Build UI
npm --workspace apps/ui run build
# Copy into backend bundle
mkdir -p dist-cloudrun/public
cp -R apps/ui/dist/. dist-cloudrun/public/
Troubleshooting¶
Container fails to start
Ensure PORT is set to 8080 (or matches your --port flag) and the server binds to 0.0.0.0, not localhost. Cloud Run routes traffic through its internal proxy and requires the service to listen on all interfaces.
Requests time out at 60 seconds
The default Cloud Run timeout is 300 seconds, but load balancers or CDNs in front of Cloud Run may have their own timeouts. Check your --timeout setting and any intermediary configuration.
Image size is too large
Use node:20-slim as the base image (not node:20). Ensure you run npm install --omit=dev in the container to exclude dev dependencies. A typical bundled deployment is 30--60 MB.
SSE connections drop after 30 seconds
Ensure your server sets the connection: keep-alive and cache-control: no-cache, no-transform headers on SSE responses. The createAgentExecuteStreamHandler sets these automatically in its response headers. Also verify that no reverse proxy is buffering responses.