Skip to content

GeminiClient

GeminiClient is the primary interface for interacting with Google's Gemini models. It wraps the @google/genai SDK with production-ready features: automatic retries with exponential backoff, document and MIME validation, streaming, structured JSON output, and injectable dependencies for testing.

Import

import { GeminiClient } from "@modernpath/agent-framework";

Constructor

const client = new GeminiClient(config: GeminiClientConfig, deps?: {
  ai?: GenAIClientLike;
  sleep?: (ms: number) => Promise<void>;
});

GeminiClientConfig

Property Type Default Description
apiKey string required Google AI API key.
model string "gemini-3-flash-preview" Gemini model identifier.
maxOutputTokens number 8192 Default max output tokens per request.
temperature number 0.7 Default sampling temperature (0.0 -- 2.0).
maxInlineDocumentSizeBytes number 20971520 (20 MB) Max size for inline (base64) document attachments.

Dependencies (optional)

Property Type Description
ai GenAIClientLike Injectable Google GenAI client interface for testing.
sleep (ms: number) => Promise<void> Injectable sleep function for deterministic retry testing.

Properties

modelName

get modelName(): string

Returns the configured model identifier (e.g. "gemini-3-flash-preview").

getModelInfo()

getModelInfo(): { maxInlineDocumentSizeBytes: number }

Returns model configuration metadata.

Methods

generateContent()

Generates text from a prompt with optional documents, system instructions, and structured output.

async generateContent(
  prompt: string,
  options?: GeminiOptions,
): Promise<{
  text: string;
  usage?: GeminiUsage;
  finishReason?: string;
  groundingMetadata?: any;
}>

Retry behavior: Automatically retries up to 3 times with exponential backoff on rate-limit errors (HTTP 429, "resource exhausted", "quota exceeded") and empty model responses.

JSON recovery: When responseMimeType is "application/json" and the non-streaming response is not valid JSON, the method automatically retries via streaming and concatenates chunks to recover a valid response.

Basic usage

const result = await client.generateContent(
  "What are the top 3 causes of heating failures?",
);
console.log(result.text);
console.log(result.usage); // { input: 42, output: 150, total: 192 }

With system prompt and documents

import { readFileSync } from "fs";

const result = await client.generateContent(
  "Summarize the attached document.",
  {
    systemPrompt: "You are a technical document analyst.",
    documents: [{
      name: "report.pdf",
      mimeType: "application/pdf",
      content: readFileSync("./report.pdf"),
    }],
    temperature: 0.3,
    maxOutputTokens: 4096,
  },
);

Structured JSON output

const result = await client.generateContent(
  "Extract all person names and their roles.",
  {
    responseMimeType: "application/json",
    responseSchema: {
      type: "object",
      properties: {
        people: {
          type: "array",
          items: {
            type: "object",
            properties: {
              name: { type: "string" },
              role: { type: "string" },
            },
            required: ["name", "role"],
          },
        },
      },
      required: ["people"],
    },
  },
);

const data = JSON.parse(result.text);

generateContentStream()

Streams text chunks as an async generator. Same retry and backoff behavior as generateContent().

async *generateContentStream(
  prompt: string,
  options?: GeminiOptions,
): AsyncGenerator<string>

Streaming to stdout

for await (const chunk of client.generateContentStream("Explain step by step.")) {
  process.stdout.write(chunk);
}

Collecting streamed output

let fullText = "";
for await (const chunk of client.generateContentStream(prompt, options)) {
  fullText += chunk;
  onChunk(chunk); // e.g. send to SSE client
}

createChat()

Creates a stateful multi-turn chat session.

createChat(options?: GeminiChatCreateOptions): GeminiChatSession

See Chat Sessions for details.


files()

Returns a GeminiFiles instance for uploading and managing files via the Files API.

files(): GeminiFiles

fileSearchStores()

Returns a GeminiFileSearchStores instance for managing File Search Stores and executing semantic search.

fileSearchStores(): GeminiFileSearchStores

GeminiOptions

Options passed to generateContent() and generateContentStream().

Property Type Description
systemPrompt string System-level instruction prepended to the prompt.
documents GeminiDocument[] Inline documents (base64-encoded in the request). Max size governed by maxInlineDocumentSizeBytes.
fileReferences GeminiFileReference[] References to files uploaded via the Files API. Use for files > 20 MB or for reuse across requests.
maxOutputTokens number Override the default max output tokens for this request.
temperature number Override the default temperature for this request.
responseMimeType "text/plain" \| "application/json" Request plain text or structured JSON output.
responseSchema Record<string, any> JSON Schema for structured output (requires responseMimeType: "application/json").
responseJsonSchema Record<string, any> Alias for responseSchema used by some SDK tooling.
tools Array<Record<string, any>> Gemini tools (e.g. google_search, url_context, fileSearch). Passed through to the API as-is.
mediaResolution "low" \| "medium" \| "high" Controls image/PDF processing resolution.

Types

GeminiDocument

Inline document attachment (base64-encoded in the request body).

interface GeminiDocument {
  content: Buffer;   // Raw file bytes
  mimeType: string;  // e.g. "application/pdf"
  name: string;      // Display name
}

GeminiFileReference

Reference to a file previously uploaded via the Files API.

interface GeminiFileReference {
  fileUri: string;       // Files API URI
  mimeType: string;      // MIME type of the file
  displayName?: string;  // Optional display name
}

GeminiUsage

Token usage metadata returned from generation calls.

interface GeminiUsage {
  input?: number;    // Prompt token count
  output?: number;   // Output (candidates) token count
  total?: number;    // Total token count
  thoughts?: number; // Internal chain-of-thought tokens (Gemini thinking models)
}

Thinking tokens and output limits

Gemini thinking models count thoughts tokens against maxOutputTokens. If thoughts + output >= maxOutputTokens, the visible response may be truncated. Monitor usage.thoughts in production to tune your token budget.

Error Handling

The client throws descriptive errors in these scenarios:

Scenario Error Message Pattern
Missing API key "GeminiClient: apiKey is required"
Empty prompt "GeminiClient: prompt must be a non-empty string"
Unsupported MIME type "Unsupported MIME type: {type}"
Document too large "Document too large for inline upload: {name} ({size} bytes > {max} bytes)"
All retries exhausted "Failed to generate content after 3 attempts: {reason}"
Empty response with finish reason "Empty response from model. Finish reason: {reason}"

Debug Logging

Enable with DEBUG=gemini-client or DEBUG=gemini-api:

DEBUG=gemini-client node app.js

Logs include request IDs, timing, token usage, retry attempts, and response previews.