于 AWS ECS 上托管 MCP 网关注册:企业智能代理 AI 系统之实用蓝本
AI 代理非复仅答疑问之演示应用。
渐为可施行动作之系统:索查客户记录,更新商机,生成报价,制立工单,核验库存,阅读契约,触发流程,交互商业应用。
此乃企业之实患所起也.
若AI仅作言谈,其患尚浅。然若代理始用诸器、诸API及企业之系统,则需更严之运作模式。须知代理可及何物,孰所核准,可触何数据,以及如何监察其行止.
此正需MCP Gateway与Registry处也乃要义所系.
MCP Gateway Registry者,使吾等得设MCP服务器之注册之所,察可用之器,理认证之事,驭存取之权,观代理与企能相协之状也。
是此博客,吾将导览如何于AWS上以ECS Fargate托管MCP Gateway Registry,依循MCP Gateway Registry项目之Terraform AWS ECS部署模型。此博客基于仓库https://github.com/agentic-community/mcp-gateway-registry/tree/main,所有功劳归于仓库贡献者.
此问题为何要关注
于早期AI代理项目中,架构通常起于简朴。
一代理连一具或二具器。
譬如:
Sales Agent
|
|-- Salesforce MCP Server
|-- Knowledge Base MCP Server
此法于验证之思,甚为得宜。
然时日稍久,众队始建诸使。
销售团队欲求 Salesforce 及报价之器。
支持团队欲票务与知识库之工具。
財務之眾欲賬務與契約之器。
送達之眾欲 Jira、專案之報、文獻之索。
領導之眾欲報告與分析之使。
俄而,境象始若此:
Agent 1 ---> MCP Server A
Agent 1 ---> MCP Server B
Agent 2 ---> MCP Server A
Agent 2 ---> MCP Server C
Agent 3 ---> MCP Server D
Agent 4 ---> MCP Server B
Agent 5 ---> MCP Server E
斯时也,事非复惟技之合矣。
其真患者何:
Who owns each MCP server?
Which agent is allowed to use which server?
What permissions does each tool have?
How do we prevent duplicate MCP servers?
How do we audit tool usage?
How do we onboard new tools safely?
How do we remove old or risky tools?
How do we monitor failures?
How do we stop agents from accessing sensitive systems without approval?
若不早解此,MCP层可成无制之合层。
而于企业之制,无制之合,恒为患之由。
何谓MCP网关注册表之实然功用
MCP网关注册表,实为AI诸器与MCP服务器间之制御之境
非使诸器皆直通诸MCP服务器,吾辈设一可驭之网关与注册之层
架构乃清朗:
AI Agents / Developers / Applications
|
v
MCP Gateway and Registry
|
v
Approved MCP Servers
|
v
Enterprise Applications
此使吾辈得良善之运作模式。
注册表有助于维护有关可用MCP服务器的信息:
Server name
Owner
Description
Capabilities
Available tools
Security scopes
Environment
Version
Health status
Approval status
Discovery metadata
此门户之设,用以御之、导之也。
Authentication
Authorization
Tool discovery
Request routing
Policy enforcement
Logging
Monitoring
Access control
此甚要,盖企业之代理者,不可妄发而用器。当循规而取,用既定之器,于既定之域,经有司所制之途也。
何故将此托管于 AWS ECS 之理
有数法可设MCP Gateway Registry。
可于虛擬機中運行。
可布于Kubernetes。
可于 ECS 上运行。
乃至以简易 Docker Compose 部署行于本地测试。
然若为企业级 AWS 部署, ECS Fargate 乃甚为实用之选。.
其予吾辈以管理容器之运,而无管理 EC2 工作节点或 Kubernetes 全控制面之劳。
此等网关,ECS Fargate(Elastic Container Service Fargate)简繁相济,堪宜生产之需。
其要益有:
No EC2 server management
Container-based deployment
Built-in integration with IAM
Easy logging through CloudWatch
Service-level health checks
Integration with Application Load Balancer
Auto-scaling support
Good fit for Terraform automation
Lower operational complexity than Kubernetes
依吾见,若一组织未有成熟之EKS平台(Elastic Kubernetes Service)及Kubernetes(Kubernetes)之运作范式,则ECS Fargate为承此等控制平面服务之首选。
Kubernetes虽赋予更多变通,然亦添更多运维之责。于众团队,此非初日所需。
目标AWS架构
MCP Gateway Registry之生产式AWS架构,可若此:
Users / Agents / Developers
|
v
Route 53 Custom Domain
|
v
CloudFront
|
v
AWS WAF
|
v
Application Load Balancer
|
v
ECS Fargate Services
| | |
Registry Auth Server Keycloak
| | |
| | v
| | Aurora PostgreSQL
|
v
Amazon DocumentDB
Supporting Services:
- AWS Secrets Manager
- CloudWatch Logs
- CloudWatch Alarms
- ECR
- IAM
- ACM
- Optional Prometheus and Grafana
此非仅关乎容器之运。
此架构予吾:
Secure external access
Managed container hosting
Central authentication
Registry persistence
Secret management
Observability
Certificate management
Custom domain support
Infrastructure automation
此乃演示部署与企业部署之别也。
AWS核心组件
1. 亚马逊弹性容器服务(Amazon ECS Fargate)
ECS Fargate 运行容器化之服务。
部署可含诸般服务,如:
MCP Gateway Registry
Authentication server
Keycloak
MCP gateway service
Sample MCP servers
Sample agents
Observability components
每项服务皆以ECS任务运行
于生产之际,吾当荐分而治之,使诸服务明晰,毋使一容器裹括过多。如此则于伸缩、日志、部署、排错诸端,可获更佳之掌控
譬如:
Registry service --> Handles MCP server metadata and discovery
Auth service --> Handles authentication flow
Keycloak service --> Identity and access management
Sample MCP services --> Optional, mostly for demo or validation
生产环境,样本代理及样本MCP服务器应禁用或仅部署于非生产环境.
2. 应用负载均衡器
应用负载均衡器通过HTTPS端点暴露ECS服务.
其执行路由至正确的ECS目标组.
例如:
/registry --> Registry service
/auth --> Auth service
/keycloak --> Keycloak service
抑或,于更洁之生产模式:
registry.company.com --> Registry service
auth.company.com --> Auth service
kc.company.com --> Keycloak
此域别之分离,于企业之用为善,盖因其增明察,固边防,明经权。
3. CloudFront
云前可设ALB之表。
于生产,此有益,盖因其赐:
Global edge access
Better TLS handling
Additional protection layer
Integration point for WAF
Cleaner public access pattern
Potential performance benefits
于仅供内用之部署,云前或非必。然若注册表为分部团队、外发开发者或云宿代理所及,云前则有益。
4. 亚马逊AWS WAF
吾甚荐用 AWS WAF 遮蔽于向互联网之端点.
MCP 之关口,乃机要之入径,盖控工具之出入,故不可轻露.
WAF 之制,有益者有:
Rate limiting
AWS managed rule groups
IP restrictions
Bot protection
Geo restrictions if required
SQL injection protection
Cross-site scripting protection
此尤要者,若使吏员、匠者或外系之系统,越互联网而达网关,是也。
5. 路由53与ACM
路由53掌管DNS之录。
ACM供SSL/TLS之证。
由此得净URL,如:
registry.company.com
auth.company.com
kc.company.com
企业采用,此重于众人之思。域名洁净,平台方显为真实内产,非临时工设之形.
6. 亚马逊Aurora PostgreSQL
Aurora PostgreSQL用于Keycloak数据.
Keycloak需关系数据库以存身份相系之信息,其要者:
Users
Realms
Clients
Roles
Sessions
Identity provider configuration
Authentication settings
使用 Aurora 可得较佳之可靠性,较之在容器中运行数据库为优
至于生产,吾当避此平台之容器化数据库。身份至重,岂可轻视
7. 亚马逊文档数据库
注册层用此文档数据库
此乃MCP服务器与代理元数据所储之域.
例之记录或可含:
MCP server name
MCP server URL
Tool list
Tool descriptions
Security scopes
Server health
Owner team
Environment
Version
Approval state
Risk classification
时日既久,此注册表遂为代理可及之能力之企业目录.
此诚可贵。
此令群可索而得既有之器,毋须屡建同之MCP服务器。
8. AWS Secrets Manager
Secrets Manager当用之:
Database credentials
Keycloak admin credentials
JWT secrets
Client secrets
Service credentials
API keys
无生产凭信硬嵌于Terraform文件、Docker图像,或Git所存之环境文件中。
此乃基础,然初涉AI平台之项目,往往忽略之。
9. 云端监察日志与警报
凡ECS服务,皆当录日志于云端监察。
至少,吾辈当监察者:
Container startup failures
Authentication failures
Registry API errors
Tool discovery failures
Database connection errors
ECS task restarts
ALB 4xx errors
ALB 5xx errors
High latency
Memory pressure
CPU pressure
然MCP网关之设,仅录基础设施日志,犹未足也。
吾辈亦需代理活动之日志.
譬如:
Which agent requested tool discovery?
Which MCP server was selected?
Which tool was invoked?
Which scope was used?
Was the request allowed or denied?
What was the response status?
How long did the tool call take?
Was sensitive data involved?
此乃MCP网关始成治理之系统,非徒为路由之层.
部署之选
Terraform之设,支持异种部署之式.
选项一:独用CloudFront
此乃速成POC之用.
无须定制域。得CloudFront所生之URL.
此适于:
Internal demo
Engineering validation
Architecture exploration
Short-term sandbox
此非吾所偏之生产之选,然为速起之良法.
选项二:仅定制域
此模中,用 Route 53 与 ACM,然 CloudFront 或未启用。
汝得 URL 若此:
registry.company.com
kc.company.com
此较随机所生 URL 为善,然若公之于众,或不能足护边缘。
此于私/内部署,则甚宜。
选项三:云前沿加定制域名
此乃最佳生产模式。
流量如此流转:
User / Agent
|
v
Custom Domain
|
v
CloudFront
|
v
WAF
|
v
Application Load Balancer
|
v
ECS Fargate Service
此得更强生产之姿。
吾之推荐:
Use CloudFront + Route 53 + WAF for production.
Use CloudFront-only for demo.
Use custom domain-only only for controlled internal environments.
实施流程
此流程可分阶段而明。
第一阶段:备置 AWS 账户
启程之前,当决:
AWS region
VPC strategy
Domain name
Environment name
Access model
CIDR restrictions
Secrets strategy
Terraform state backend
若为生产,吾不欲此部署于随机共享之 AWS 账户。
更佳之模:
Separate AWS account for dev
Separate AWS account for staging
Separate AWS account for production
至少,须分设环境,分置Terraform之状态
第二阶段:构建并推送至ECR之镜像
诸服务须构建为Docker之镜像,并推至Amazon ECR
简略之流程:
export AWS_REGION=us-east-1
make build-push
其果乃一系ECR图像URI之集.
例:
123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway-registry:v1.0.0
123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway-auth:v1.0.0
123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway:v1.0.0
若用于生产,当避用latest.
宜用版本化不可变之标签.
不善:
mcp-gateway-registry:latest
为善:
mcp-gateway-registry:v1.0.3
最佳:
mcp-gateway-registry:v1.0.3-build-20260524
此助回滚、审计及发布可追溯.
第三阶段:配置Terraform变量
terraform.tfvars文件乃部署配置之所在.
重义尚德,其要者有:
aws_region = "us-east-1"
enable_cloudfront = true
enable_route53_dns = true
base_domain = "company.com"
session_cookie_domain = ".company.com"
session_cookie_secure = true
ingress_cidr_blocks = [
"YOUR_OFFICE_IP/32",
"YOUR_VPN_IP/32"
]
数据库与管理员密码,当慎之又慎。
强生产之模,此当源自安密之秘注,非由手置於本地之文也。
第四阶段:初始化Terraform
行:
terraform init -upgrade
生产用,Terraform状态应存于远地
推荐后端:
S3 bucket for state
DynamoDB table for locking
KMS encryption
Restricted IAM access
勿用本地状态于生产
本地状态可于学,然不可于企业之基构
第五阶段:先制证书
ACM证书常需DNS验证之故
是以部署或需先针对性申请证书
理想而言:
terraform apply \
-target=aws_acm_certificate.keycloak \
-target=aws_acm_certificate.registry \
-target=aws_acm_certificate_validation.keycloak \
-target=aws_acm_certificate_validation.registry
是使证书得先立而后验,俟其余基建皆恃之始成.
阶第六:布设全基建
既验其证书:
terraform apply
是布全栈:
Networking
Security groups
ECS cluster
ECS services
ALB
Target groups
CloudFront
Route 53 records
Aurora PostgreSQL
DocumentDB
Secrets
CloudWatch logs
IAM roles
Optional observability stack
此时,基设已成,然应用犹待初始化。
第七阶段:运行部署后设置
部署后之设置甚为要紧。
此步骤通常执行:
Terraform output extraction
DNS validation
ECS service health checks
Keycloak realm setup
Client setup
Admin user setup
DocumentDB collection initialization
Registry indexes
Scope setup
Service restart
Endpoint validation
此步化基础设施为可用平台。
无此,则容器虽行,而网关或未全备。
既布,众可始注MCP之服务器。
善注MCP服务器者,当具:
Server name
Business capability
Owner team
Technical owner
Environment
Base URL
Supported tools
Required scopes
Risk level
Data classification
Health check endpoint
Approval status
Version
譬如:
Name: Salesforce Opportunity MCP Server
Owner: Sales Platform Team
Environment: Production
Tools:
- searchOpportunity
- updateOpportunityStage
- getAccountDetails
Scopes:
- salesforce.read
- salesforce.opportunity.update
Risk: High
Data: Customer and revenue data
Approval: Required
此层元数据甚为要义
无之,则注册表仅余技术之目录。有之,则注册表成企业之实控之域
企业治理模型
为企业之用,吾当为MCP服务器立明其生命周期。
所荐MCP服务器之生命历程
Draft
|
Submitted for Review
|
Security Review
|
Approved for Dev
|
Approved for Production
|
Monitored
|
Deprecated
|
Retired
每MCP服务器当有主理之人
每高危之器须得核准
每生产MCP服务器当有监察
每已废服务器须有退役之期
此或显沉重,然一旦代理触及真实系统,则势在必行.
访问控制模型
网关不应允所有代理使用所有MCP服务器.
此乃设计之弱.
更优之模,乃基于范围之访问.
例:
Agent: Sales Copilot
Allowed scopes:
- salesforce.read
- quote.read
- product.search
Not allowed:
- discount.approve
- contract.delete
- customer.export
又一事例:
Agent: Deal Desk Agent
Allowed scopes:
- quote.read
- quote.update
- discount.request
- contract.read
Requires approval:
- discount.approve
- final_quote.submit
此乃吾辈防代理者权柄过滥之法
代理式人工智能系统之最大风险,莫过于工具权限之过度。若授一代理者以工具有余、权柄过重,则其行止难驭,影响莫测。
智能体系统之可察性
旧式应用之监察,于此不足矣
吾辈需系统之可察与智能体之可察
系统之可察
CPU
Memory
Container restarts
Task failures
ALB errors
Request latency
Database connections
Authentication errors
出全屏模式
__JHSNS_SEG_de1bc655_287__智能体与工具之可察__JHSNS_SEG_de1bc655_288__踪:
Agent ID
User ID
Tool requested
MCP server used
Scope used
Decision outcome
Policy result
Execution latency
Failure reason
Data classification
External system touched
譬如,一有用之审计日志,或若此:
{
"agent": "sales-copilot",
"user": "john@company.com",
"mcp_server": "salesforce-opportunity-server",
"tool": "updateOpportunityStage",
"scope": "salesforce.opportunity.update",
"decision": "allowed",
"timestamp": "2026-05-24T10:15:00Z",
"latency_ms": 450,
"status": "success"
}
此般记录,于事有谬则尤为紧要。
若代理人误更机缘,或误用定价之器,吾辈当能复现其事之详。
CI/CD之模
于生产,部署不可手为。
善之CI/CD管道,当如是:
Developer raises PR
|
Code review
|
Build Docker images
|
Run unit tests
|
Run container security scan
|
Push image to ECR
|
Terraform plan
|
Manual approval for production
|
Terraform apply
|
Run post-deployment setup
|
Smoke test
|
Notify platform team
此以控之,可察之。
欲回滚,则团队当速重布旧之图签。
推荐之境略
吾以为宜设三境。
Development
Staging
Production
开发之境
用于工程测试.
可设宽松之制.
Sample MCP servers allowed
Lower database capacity
CloudFront-only mode acceptable
Limited monitoring
预演
用于生产前之验证.
宜近于生产.
Custom domain
WAF enabled
Production-like IAM
Production-like secrets
Observability enabled
生产
供企业代理人实用。
宜固其质。
Separate AWS account
CloudFront + WAF
Private subnets
Strict ingress
Immutable images
Centralized logs
Audit trail
Backup enabled
Approval workflow
产制固化检点录
未称此为产制可成,吾当验其事。
Remote Terraform state enabled
Terraform state encrypted
DynamoDB locking enabled
Separate AWS accounts or environments
Secrets stored in Secrets Manager
No secrets in Git
CloudFront enabled
WAF enabled
Ingress restricted
Keycloak admin access restricted
ECS tasks in private subnets
ALB security groups reviewed
Aurora backups enabled
DocumentDB backups enabled
CloudWatch alarms configured
Container image scanning enabled
Immutable image tags used
IAM least privilege applied
Audit logging enabled
MCP server ownership defined
Tool scopes defined
Production approval process defined
Runbook created
Rollback process tested
最常见之谬误,莫过于 Terraform 部署成功后即止步.
此仅示基础架构已立.
非谓平台固安,有司掌之,可察,或已备生产之用.
运营手册
凡企业之重设,平台之众当持简明手册.
手册当应:
How do we onboard a new MCP server?
How do we approve a production MCP server?
How do we revoke access?
How do we rotate secrets?
How do we check service health?
How do we debug registry failures?
How do we debug authentication failures?
How do we rollback a release?
How do we retire an old MCP server?
How do we investigate suspicious tool usage?
此乃平台成熟之关键
MCP网关非一次性部署,乃智能AI平台之有机组成
企业智能架构中此之位置
于更宏大之企业代理人工智能架构中,MCP网关注册表居中于调度与企业工具之间。
实用之模:
User Interface
|
v
Agent Orchestrator
|
v
Policy / Guardrail Layer
|
v
MCP Gateway Registry
|
v
MCP Servers
|
v
Enterprise Systems
调度者决其事。
政策层察其行可否。
MCP 之网关,主控工具之发现与取用。
MCP 之服务器,实为系统交感之所。
此分野,甚为要义。
勿使诸责尽集于一巨代理。
其难于拓展,难于排错,且危于治理。
吾之实用之议
若为实业之部署,吾当以是设 MCP Gateway Registry:
AWS ECS Fargate for services
CloudFront in front
AWS WAF enabled
Route 53 custom domains
ACM certificates
Application Load Balancer
Private subnets for ECS tasks
Aurora PostgreSQL for Keycloak
DocumentDB for registry metadata
Secrets Manager for credentials
CloudWatch for logs and alarms
Optional Grafana and Prometheus for deeper observability
S3 backend for Terraform state
DynamoDB for Terraform locking
CI/CD for image build and deployment
Immutable ECR image tags
Strict admin access
Scope-based authorization
Audit logs for all tool usage
若为POC,吾当求简。
若为生产,吾必不稍损于安全、日志、及访问之控。
所得要义
至要之训,在此:
主理MCP网关注册,非仅基建之事。实乃企业代理运作之始也。
若代理欲用实器,则组织需:
Tool ownership
Tool approval
Tool discovery
Tool scopes
Tool observability
Tool lifecycle management
Tool risk classification
无此,则代理式AI系统或能技通,然必败于运。
然于诸业,运筹之失,常为所阻,致其不纳.
终思
MCP正使工器之合,趋于常则,于AI之属,此乃大变也.
然常则亦生其广.
既广其器众,则需治之。
是故,MCP Gateway Registry 当视作平台之核心能力,非为偏侧之组件也。
其予工程之众,以有次第之法,显其器用。
其予安全之众,以有制御之法,掌其出入。
其予平台之众,以有监察之法,察其用之所在。
此使商贾之众,信诸吏非直且盲触企业之系.
吾谓此乃生成级智能代理之要基.
来日非一代理直连众器之景也.
来日将由智能体之生态统御,诸器可注册、可发现、可监察、可保障、可管理其生灭,皆通中枢之制。












