Document Composition¶
The composition system enables modular, multi-document knowledge bases by allowing markdown documents to declare dependencies via YAML frontmatter includes[]. When a canonical document is fetched for grounding, the CompositionLoader performs a bounded BFS traversal to resolve all included documents into a single CompositionBundle.
This enables patterns like a root SOP that includes shared glossaries, escalation policies, and reference data -- all automatically resolved and attached to the LLM context.
How It Works¶
Frontmatter Syntax¶
Markdown documents declare their includes using standard YAML frontmatter:
---
title: "Cold Complaint Procedure"
canonical_path: sops/cold-complaint.md
includes:
- ./shared/glossary.md
- ./shared/escalation-policy.md
- ../reference/temperature-thresholds.md
---
# Cold Complaint Procedure
When a tenant reports cold temperatures...
Rules:
includesmust be a YAML array of strings.- Each entry is a relative path resolved against the parent document's location in storage.
- Absolute paths and URLs are rejected (security measure).
- Path traversal attempts (e.g.
../../.env) that escape the storage root are blocked. - Non-string entries and duplicates are silently skipped.
Traversal¶
The CompositionLoader performs BFS (breadth-first search) expansion starting from the root document:
graph TD
A["cold-complaint.md\n(root, depth 0)"] --> B["glossary.md\n(depth 1)"]
A --> C["escalation-policy.md\n(depth 1)"]
A --> D["temperature-thresholds.md\n(depth 1)"]
C --> E["on-call-contacts.md\n(depth 2)"] Safety guarantees:
| Guard | Description | Default |
|---|---|---|
| Cycle detection | Documents already visited are skipped (by canonical URI). | Always active |
| Diamond dedup | Same document referenced from multiple parents is fetched only once. | Always active |
| Depth limit | Maximum BFS depth from the root document. | 2 |
| Document budget | Maximum total documents in the bundle (including root). | 8 |
| Byte budget | Maximum total bytes for included documents (excludes root). | 5 MB |
| Timeout | Wall-clock limit for the entire traversal phase. | 10 seconds |
| Path safety | Relative paths that escape the storage root are rejected. | Always active |
Only documents with traversable MIME types (default: text/markdown) have their frontmatter parsed for further includes. Non-markdown includes are fetched and added to the bundle but not recursed into.
Import¶
Note
In most cases, you do not use CompositionLoader directly. Instead, configure a compositionPolicy on CanonicalGroundingService and the expansion happens automatically during prepareFromSources().
CompositionPolicy¶
Configuration controlling the bounded traversal.
interface CompositionPolicy {
traversableMimeTypes?: string[]; // default: ["text/markdown"]
maxDepth?: number; // default: 2
maxDocs?: number; // default: 8
maxIncludeBytes?: number; // default: 5242880 (5 MB)
timeoutMs?: number; // default: 10000 (10s)
onFetchError?: "skip" | "abort"; // default: "skip"
}
| Property | Type | Default | Description |
|---|---|---|---|
traversableMimeTypes | string[] | ["text/markdown"] | Only documents with these MIME types have their frontmatter parsed for includes. |
maxDepth | number | 2 | Maximum depth of include traversal. Depth 0 = root, depth 1 = root's direct includes, depth 2 = includes of includes. |
maxDocs | number | 8 | Maximum total documents in the bundle (including root). |
maxIncludeBytes | number | 5242880 | Maximum total bytes for all fetched included documents (excludes root, which is always fetched). |
timeoutMs | number | 10000 | Wall-clock timeout for the entire composition traversal phase. |
onFetchError | "skip" \| "abort" | "skip" | What to do when a single include fails to fetch. "skip" logs and continues; "abort" stops traversal and returns what was fetched so far. |
Composition policy examples
CompositionLoader¶
The loader that performs the BFS traversal.
Constructor¶
const loader = new CompositionLoader(
fetcher: CanonicalDocumentFetcher,
defaultPolicy?: CompositionPolicy,
);
loadBundle()¶
Expand a root document into a composition bundle.
async loadBundle(
rootDoc: PreparedDocument,
rootPointer: CanonicalDocumentPointer,
ctx: { userId: number; auditingId: number },
policyOverride?: CompositionPolicy,
alreadyVisited?: Set<string>,
prepareOptions?: PrepareDocumentOptions,
): Promise<CompositionBundle>
| Parameter | Type | Description |
|---|---|---|
rootDoc | PreparedDocument | Already-fetched root document. |
rootPointer | CanonicalDocumentPointer | Canonical pointer of the root document. |
ctx | { userId, auditingId } | Auth context for fetching included documents. |
policyOverride | CompositionPolicy | Per-call policy override (merged with constructor default). |
alreadyVisited | Set<string> | Canonical URIs to treat as already fetched (for cross-root deduplication). |
prepareOptions | PrepareDocumentOptions | Options forwarded to CanonicalDocumentFetcher. |
Types¶
CompositionBundle¶
The result of expanding a root document's include graph.
interface CompositionBundle {
root: PreparedDocument; // The root document (always present)
rootPointer: CanonicalDocumentPointer; // Canonical pointer of the root
included: PreparedDocument[]; // Included documents in BFS order
includedPointers: CanonicalDocumentPointer[];
allDocuments: PreparedDocument[]; // [root, ...included] convenience accessor
edges: CompositionEdge[]; // Include graph edges (for debug/visualization)
skipped: SkippedInclude[]; // Includes that were not fetched, with reasons
policy: ResolvedCompositionPolicy; // The resolved policy that was used
}
CompositionEdge¶
A single edge in the include graph.
interface CompositionEdge {
from: string; // Canonical URI of the parent document
to: string; // Canonical URI of the included document
relativePath: string; // The raw include path from frontmatter
depth: number; // BFS depth of this edge
}
SkippedInclude¶
Record of an include that was discovered but not fetched.
interface SkippedInclude {
parentUri: string; // Canonical URI of the parent document
relativePath: string; // The raw include path from frontmatter
reason: SkipReason; // Why it was skipped
detail?: string; // Additional detail (e.g. error message)
}
type SkipReason =
| "cycle" // Already visited (cycle or diamond)
| "depth_exceeded" // Beyond maxDepth
| "doc_budget_exceeded" // Beyond maxDocs
| "byte_budget_exceeded" // Beyond maxIncludeBytes
| "timeout" // Wall-clock timeout reached
| "non_traversable_mime" // MIME type not in traversableMimeTypes
| "resolve_failed" // Path resolution failed (e.g. path traversal)
| "fetch_failed"; // Document fetch threw an error
canonical_path Frontmatter¶
Documents can declare their canonical storage path via canonical_path frontmatter. This ensures correct key assignment during upload and proper relative path resolution during composition:
When canonical_path is present:
- Upload tools place the file at the correct storage key.
- Relative paths in
includes[]are resolved correctly regardless of the upload source. - Re-uploads always overwrite the same key (idempotent).
Integration with CanonicalGroundingService¶
The most common way to use composition is through CanonicalGroundingService, which automatically handles composition when a compositionPolicy is configured:
const grounding = new CanonicalGroundingService(stores, fetcher, {
compositionPolicy: {
maxDepth: 2,
maxDocs: 8,
maxIncludeBytes: 5 * 1024 * 1024,
},
});
const result = await grounding.prepareFromSources(sources, ctx);
// result.compositionBundles -> per-root composition details
// result.allDocumentsForAttachment -> deduplicated, ordered document list
The service uses a shared visited set across all root documents, so if document A includes glossary.md, document B will not re-fetch it during its own expansion.
Debug Logging¶
Enable with:
Logs include BFS queue state, fetch timing, byte budgets, cycle detection, and skip reasons.
Related Pages¶
- Canonical Grounding -- orchestrates composition as part of the grounding pipeline
- Knowledge Base Overview -- full RAG pipeline
- Ingestion -- setting up
canonical_pathand metadata during upload - Grounding Policy -- controlling what reaches the LLM