Filesystem-style access to your documents, for AI agents, in your own AWS
account. list / glob / grep / tree / find / ranged read over your
documents in your S3, exposed through MCP (and REST). It's multi-tenant,
deploys with one terraform apply, costs ~$2/month idle, and every
stateful layer is swappable.
Status: early, in active development. v1.0.0 is published to PyPI and the repo is public; contributions are welcome (see the open issues and
docs/build-progress.md). The full loop runs on AWS: ingest, extract, catalog, and the MCP/REST read surface, with scheduled heal-from-S3 and high-signal alarms. License: Apache-2.0. Background and rationale live indocs/agentic-fs-oss-plan.md.
What an agent gets
A bounded, scoped MCP tool surface. An agent explores a document corpus the way a coding agent explores a repo, except over extracted document text, indexed at scale, multi-tenant, and remote:
- Navigate.
fs_list,fs_tree,fs_glob,fs_find(by type, size, mtime, status). - Search.
fs_grep: two-stage, bounded, with ripgrep-style filters. - Read.
fs_read(ranged, or bysection),fs_outline(a doc's heading map),fs_tables,fs_diff. - Work.
scratch_*(a per-principal workspace),whoami.
Every tool runs through one middleware that applies claims-filtered visibility,
scope enforcement, a per-call output budget, and an audit log
(ADR 0012). Adding a tool is a
registry entry, not a fork. Semantic fs_search is an optional accelerator on
the roadmap; grep is the floor.
Run it locally (5 minutes)
Requirements: Docker,
uv, and make
(macOS: xcode-select --install).
git clone https://github.com/vivekkhimani/agentic-fs && cd agentic-fs make dev # builds the image, starts MinIO + DynamoDB Local + the API, seeds the bucket/table curl localhost:8080/v1/healthz # {"status":"ok","version":"..."} curl localhost:8080/v1/me # the local dev principal # Ingest a folder of documents, then read them back: uv run fs-crawler --connector local --source ./docs --api-url http://localhost:8080 --namespace handbook curl "localhost:8080/v1/fs/handbook/entries" # the catalog rows that appeared
The MCP surface is mounted at localhost:8080/mcp, so you can point any MCP
client at it. make down stops the stack and make clean also wipes the volumes.
The API is the same container image that runs on AWS Lambda and Fargate
(ADR 0003).
Local dev uses a static dev principal (
AFS_AUTH_MODE=dev). Never run that in production. In production agentic-fs is an OAuth 2.1 resource server: you bring your own IdP (WorkOS, Cognito, Auth0, Okta, Keycloak), andafs auth doctorshows exactly how a token maps to a principal (auth swap-guide, ADR 0013).
Develop
uv sync # set up the Python workspace (once) make test # run the test suite make lint # ruff lint + format check make fmt # autoformat + autofix make help # list every target
Every PR is gated by CI: Python (ruff + pytest) for packages/**, and
Terraform (fmt, validate, tflint, trivy) for terraform/**.
Layout
packages/
afs-core/ contracts (Protocols), DTOs, key scheme, conformance kits (pydantic only)
afs-server/ stores, services, extraction, FastAPI app + MCP mount (implements afs-core)
afs-connector-sdk/ fs-crawler CLI + sync engine + Local FS / S3 / Drive / LlamaHub connectors
terraform/ modular IaC: global state/CI roles, per-layer modules, examples
docs/ the plan, build progress, swap guides, decision records (ADRs)
Dockerfile one image: Lambda + Fargate + local
Swap any layer (plug-and-play)
Each layer sits behind a small contract with a conformance kit and a one-page guide, so you can run it on the infrastructure you already have.
| Layer | Swap to | Guide |
|---|---|---|
| Object store | S3, MinIO, R2, Wasabi, B2 (endpoint), or GCS / Azure / HDFS / local via fsspec | object-store |
| Catalog | DynamoDB, or Postgres (BYO-RDS) | catalog |
| Compute | Lambda, Fargate, or Cloudflare Worker (edge) | compute |
| Extraction | text-native, Docling (PDF/Office/OCR), Textract, or your parser | extraction |
| Connectors | Local FS, S3, Google Drive, or LlamaHub (300+ readers) | connectors |
| Auth (IdP) | WorkOS, Cognito, Auth0, Okta, or Keycloak (BYO) | auth |
| MCP tools | add your own as afs.tools entry points |
tools |
It works by a backend name in settings plus entry-point discovery (ADR 0002).
Install (PyPI)
Install only the parts you need. The contracts are usable without the server.
pip install … |
You get | For |
|---|---|---|
afs-core |
contracts (Protocols), DTOs, key scheme, errors (pydantic only) | building a custom store/connector against the contracts |
afs-core[testing] |
the above plus conformance kits and in-memory fakes (adds pytest) | certifying your implementation against the kits |
afs-server |
the service: stores (S3/DynamoDB, [fsspec]), extraction, FsService, FastAPI app + MCP mount, the afs CLI |
running agentic-fs |
afs-connector-sdk |
the fs-crawler CLI + sync engine + Local FS / S3 / Drive / LlamaHub connectors |
crawling your documents in ([aws]/[gdrive]/[llamahub] per source) |
Distributions import as afs_core / afs_server / afs_connector_sdk, and all
are PEP 561 typed. Packaging, the namespace decision, and the release flow are in
ADR 0005. Releases
publish to PyPI on a vX.Y.Z tag via Trusted Publishing
(release.yml).
Container images
Prebuilt images are published to GHCR on each release (v* tag):
| Image | Pull |
|---|---|
| API / server | docker pull ghcr.io/vivekkhimani/agentic-fs:1.0.0 |
| Extraction worker (slim: text_native/pdf/docx/textract) | docker pull ghcr.io/vivekkhimani/agentic-fs-worker:1.0.0 |
:latest tracks the most recent release. These run directly on Fargate,
Kubernetes, and locally. One caveat: AWS Lambda can only pull from ECR in the
same account, so for the Lambda path, mirror the image into your ECR first
(docker pull the GHCR image, then tag + push to your repo). Building locally
still works too (make dev, or docker build); the worker takes
--build-arg AFS_EXTRAS=... for heavier extractors (e.g. docling).
Deploy to your AWS account
terraform/ provisions the whole footprint with per-layer modules and a
quickstart example: the state backend, CI roles, the data bucket and KMS, the
catalog table, the serving Lambda and Function URL, async ingestion (EventBridge
→ SQS → worker), the scheduled reconciler, and high-signal CloudWatch alarms. It's
one terraform apply. Start with terraform/README.md.
Acknowledgments & prior art
agentic-fs stands on ideas others published first. The design is most directly inspired by:
- Mintlify: How we built a virtual filesystem for our assistant. The core shape came from here: a virtual filesystem over existing storage, a claims-pruned path tree, two-stage grep, read-only semantics, and the sandbox cost framing.
- Anthropic on effective context engineering and code execution with MCP. Bounded, context-efficient tools and the MCP-first surface.
- "Grep is the floor." See Claude Code dropping indexing for grep and why grep beat embeddings (Augment). Semantic search stays an opt-in accelerator.
- The ecosystem we build on: the Model Context Protocol, fsspec (the object-store adapter), LlamaHub/LlamaIndex (the connector adapter), and Docling (extraction).
- Adjacent prior art: Turso AgentFS, Onyx, and Ragie.
A fuller reference list lives in docs/agentic-fs-oss-plan.md.
Learn more
docs/build-progress.md: what's built, what's next, the roadmap.docs/agentic-fs-oss-plan.md: the full design.docs/swap-guides/anddocs/decisions/: per-layer swaps and ADRs.




















