惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V
Visual Studio Blog
MongoDB | Blog
MongoDB | Blog
Engineering at Meta
Engineering at Meta
云风的 BLOG
云风的 BLOG
Microsoft Azure Blog
Microsoft Azure Blog
B
Blog RSS Feed
T
The Exploit Database - CXSecurity.com
P
Privacy & Cybersecurity Law Blog
Know Your Adversary
Know Your Adversary
月光博客
月光博客
I
InfoQ
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
爱范儿
爱范儿
S
Securelist
博客园 - 叶小钗
C
CERT Recently Published Vulnerability Notes
Recorded Future
Recorded Future
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
D
DataBreaches.Net
G
GRAHAM CLULEY
P
Proofpoint News Feed
A
About on SuperTechFans
Google DeepMind News
Google DeepMind News
C
Cyber Attacks, Cyber Crime and Cyber Security
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Tor Project blog
Stack Overflow Blog
Stack Overflow Blog
T
Threat Research - Cisco Blogs
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
Hugging Face - Blog
Hugging Face - Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Recent Announcements
Recent Announcements
P
Proofpoint News Feed
The GitHub Blog
The GitHub Blog
The Cloudflare Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
Jina AI
Jina AI
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
罗磊的独立博客
博客园 - 【当耐特】
H
Help Net Security
F
Fortinet All Blogs
T
The Blog of Author Tim Ferriss

Pinecone

Pinecone Assistant: A Managed Knowledge Layer for Production AI Applications Multi-domain RAG in n8n: why one knowledge base is not enough Allspice Transforms the Culinary Experience with Semantic Search Powered by Pinecone | Pinecone Building RAG workflows in n8n: choosing the right Pinecone node Knowledge needs a meta-knowledge layer Garbage Day: How Pinecone Safely Deletes Billions of Objects at Scale When "Performance" Means Two Different Things Pinecone BYOC: Pinecone in your AWS, GCP, or Azure account, no vendor access True, Relevant, and Wrong: The Applicability Problem in RAG Use the Pinecone Plugin for Claude Code to develop AI Applications Faster Millions at Stake: How Melange's High-Recall Retrieval Prevents Litigation Collapse Powering High-stakes Patent Search at Scale: How Melange Built a Reliable AI System on Pinecone | Pinecone Pinecone Assistant Node in n8n: Turn Any Data Source Into Knowledge RAG with Access Control Pinecone Dedicated Read Nodes are now in Public Preview Inside Pinecone: Slab Architecture New Bulk Data Operations: Update, Delete, and Fetch by Metadata The Hidden Cost of Building: Lessons from Aquant Simplifying Vector Embeddings with Pinecone Integrated Inference Capabilities Pinecone joins Microsoft Marketplace as a Launch Partner GTM Engineering: Clay + Pinecone for AI-powered Sales Outbound Build an AI knowledge assistant with Google Docs and Pinecone Moving Pinecone forward with Ash Ashutosh as CEO and Edo spearheading our growing AI ambitions as Chief Scientist Pinecone Founder Edo Liberty to Spearhead Pinecone’s Growing AI Ambitions; Appoints Ash Ashutosh as CEO to Expand Vector Database Market Leadership Fast, Accurate Retrieval for Creators at Scale: Delphi’s Path Toward a Million Conversational Agents with Pinecone | Pinecone Announcing Pinecone Pioneers: A Program for Builders, Organizers, and Community Leaders What is Context Engineering? Chunking Strategies for LLM Applications Beyond the hype: Why RAG remains essential for modern AI Obviant Makes 30% More Accurate Defense Acquisition Recommendations Combining Sparse and Dense Retrieval with Pinecone | Pinecone Build more knowledgeable AI applications with new LLMs and greater control in Pinecone Assistant #NYTECHWEEK 2025 Retrieval-Augmented Generation (RAG) Accurate and Efficient Metadata Filtering in Pinecone’s Serverless Vector Database | Pinecone Terminal X AI Agents, Powered by Pinecone, Turn Complex Financial Data Into Production-grade Insights at Scale | Pinecone Aquant Delivers Scalable, Expert-level Service Intelligence with Pinecone | Pinecone Cascading retrieval with multi-vector representations: balancing efficiency and effectiveness Vector databases aren't just for large-scale enterprise AI Unveiling DIME: Reproducibility, Scalability, and Formal Analysis of Dimension Importance Estimation for Dense Retrieval | Pinecone Fast and Effective Early Termination for Simple Ranking Functions | Pinecone Domain-specific AI Agents at Scale: CustomGPT.ai Serves 10,000+ Customers with Pinecone | Pinecone Using Pinecone asynchronously with FastAPI A Flexible Resource for Top-Weighted Comparisons Between Sets and Rankings | Pinecone Build secure, scalable agentic AI workflows with Rubrik Annapurna and Pinecone Tool up: Pinecone’s first MCP servers are here Add context to your agent with Pinecone Assistant MCP remote server E2Rank: Efficient and Effective Layer-wise Reranking | Pinecone ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring | Pinecone Efficient Constant-Space Multi-Vector Retrieval | Pinecone How Vanguard Worked with Pinecone to Boost Customer Support with Faster Calls and 12% More Accurate Responses | Pinecone Pinecone Named to Fast Company's Annual List of the World's Most Innovative Companies of 2025 Launch Week: Pinecone for agents, search, recommendations, and more Optimizing Pinecone for agents (and more) Retrieval Inference for scale and performance How 1up Turns Sales Reps Into Product Experts with Pinecone | Pinecone Don’t be dense: Launching sparse indexes in Pinecone Unlock High-Precision Keyword Search with pinecone-sparse-english-v0 Evolving Pinecone's architecture to meet the demands of Knowledgeable AI Pinpoint references faster with citation highlights in Pinecone Assistant Bringing the leading vector database to your cloud Getting started with llama-text-embed-v2 Natural Language Counterfactual Explanations for Graphs Using Large Language Models | Pinecone Easily build knowledgeable chat and agent-based applications in minutes with Pinecone Assistant, now generally available How to build an agentic, chat or RAG knowledge system using Pinecone Assistant Real-time RAG with Pinecone and Estuary Flow BigQuery to Pinecone in Real-Time with Estuary Flow Stravito Turns Market and Consumer Data Into Actionable Insights with Pinecone Inference | Pinecone Accelerate prototyping and development with Pinecone Local First-of-its-kind Pinecone Knowledge Platform to Power Best-in-class Retrieval for Customers Introducing integrated inference: Embed, rerank, and retrieve your data with a single API Strengthening security and increasing control with CMEK and API key roles Introducing Pinecone Rerank V0 Introducing cascading retrieval: Unifying dense and sparse with reranking From Idea to Action: How Pinecone Assistant Meaningfully Accelerates AI Business Building AI apps on Azure with Pinecone just got a lot easier Building a reliable, curated, and accurate RAG system with Cleanlab and Pinecone Four features of the Assistant API you aren't using - but should Deploying Pinecone with Infrastructure as Code (IaC) Streamlining CI/CD with Pinecone Local September 2024 Product Update Results of the Big ANN: NeurIPS'23 competition | Pinecone Introducing import from object storage for more efficient data transfer to Pinecone serverless Simplify, enhance, and evaluate RAG development with Pinecone Assistant, now in public preview Vectors and Graphs: Better Together August 2024 Product Update Pinecone Helps Deep Talk Deliver World-Class AI Assistants with Lower Engineering Overhead | Pinecone Assembled Delivers Better, Faster AI- Driven Support with Pinecone | Pinecone Llama 3.1 Agent using LangGraph and Ollama Build knowledgeable AI with Pinecone serverless, now generally available on Microsoft Azure Pinecone serverless is now generally available on Google Cloud, adding knowledge to AI assistants and other applications Accelerating Legal Discovery and Analysis with Pinecone and Voyage AI Bridging Dense and Sparse Maximum Inner Product Search | Pinecone Refine Retrieval Quality with Pinecone Rerank Introducing reranking to Pinecone Inference to simplify building accurate AI July 2024 Product Update Connect to Pinecone within your platform to enable a seamless AI development experience Introducing Pinecone API Versioning RAG Brag with Inkeep Co-Founder Nick Gomez LangGraph and Research Agents Introducing Pinecone Inference to streamline your AI workflow
Pinecone-Powered Knowledge Infrastructure Helps Jenova's Agent Platform Quickly Reach $1M ARR and 200,000+ Signups | Pinecone
2026-06-10 · via Pinecone

Jenova delivers hundreds of specialized AI agents (e.g., Manga Creator, Roleplay Game Master, Film Screenwriter, Fundamental Stock Analyst, Relationship Advisor), each engineered for a specific use case with encoded domain workflows, persistent memory, and context that compounds over time.

Where general-purpose AI tools require users to configure a model into an expert every session, Jenova's agents arrive ready to work. The Manga Creator orchestrates a multi-step workflow across character reference libraries, visual consistency, and narrative continuity spanning hundreds of pages. The Roleplay Game Master maintains persistent world state, tracks player decisions, and adapts the story accordingly. The Stock Analyst runs multi-source research pipelines, not single-prompt responses. These are automated expert systems, not prompt wrappers. What makes them valuable over time is retrieval: the ability to surface the right knowledge from a user's history and apply it to the current interaction.

That retrieval capability is powered by Pinecone.

Challenge

When every session starts from zero

For Jenova, the core technical challenge was clear from the start: specialized agents are only as good as the knowledge they can access. An agent that forgets everything between sessions can't build on what it's already learned, and neither can the user. Without persistent memory, users would have to re-teach agents every session, a friction point that directly erodes retention and lifetime value. To deliver on the promise of genuine expert systems, Jenova needed knowledge as foundational infrastructure, not a feature bolted on after launch.

The requirements were demanding. The platform needed agent memory retrieval across sessions, RAG over user-provided materials (uploaded documents, web content, etc.), cross-agent context reuse through a shared memory layer, and long-term personalization via stored user-level context. The data involved conversation histories, uploaded files in multiple formats, web pages, agent-generated documents, and structured memory records, all needing to be indexed, retrieved, and surfaced at the right moment during real-time interactions.

Building and operating that retrieval infrastructure in-house would have consumed the engineering resources Jenova needed for product development. And treating memory as a secondary concern was not an option. For an agent platform, memory quality is what separates a product users rely on from one they abandon.

During early infrastructure planning, the team evaluated Milvus as an alternative. The decision came down to operational simplicity, reliability, cost, and ease of integration. Pinecone was the clear winner on all counts. Jenova needed retrieval infrastructure that was production-grade from the start, not a system that would require ongoing tuning or dedicated infrastructure engineering to keep running.

Solution

A retrieval layer designed into the architecture from day one

Pinecone has been part of Jenova's architecture since day one. Retrieval-backed agent memory was a design requirement, not something bolted on after the fact. The evaluation criteria were retrieval quality, production reliability, query latency, operational simplicity, scalability, and total cost of ownership. Pinecone was the clearest fit across every dimension, with its serverless infrastructure eliminating operational burden entirely.

In Jenova's architecture, Pinecone sits inside the agent memory and retrieval layer of the orchestration pipeline. When a user sends a message, the system generates a semantically-optimized retrieval query — not a raw copy of the message, but a clean extraction of what the agent actually needs to find. That query hits Pinecone, the most relevant context is returned, and it gets injected into the agent's orchestration payload before response generation begins.

This means agents aren't working from the immediate conversation window alone. They become knowledgeable by drawing from a curated view of everything relevant across a user's history: prior sessions, uploaded materials, and cross-agent memory, surfaced at exactly the moment they need it.

The retrieval architecture has evolved through several milestones:

  • Conversation history retrieval. The first version vectorized conversation history and retrieved the most relevant prior exchanges on each new message. Unlike most agent platforms, which compress or discard older messages after a context window fills up, Jenova supports infinite chat history: users can continue the same session indefinitely without losing past context. Pinecone makes this possible by storing every exchange and retrieving only what's relevant, so nothing is compressed or lost regardless of session length. In practice, the longest single session on the platform spans over 16 million tokens, a scale that would be impossible to serve without vector-based retrieval. The effect on product quality was immediate. Agents stopped behaving as if each session was the first.
  • Document and web content RAG. Retrieval expanded to cover uploaded files across 40+ supported formats (PDF, DOCX, XLSX, CSV, TXT, code files, images, etc.) and external web content, giving agents access to user-provided knowledge alongside conversation history.
  • Cross-agent shared memory. A shared memory layer introduced structured user-level context that any agent on the platform can retrieve. Information learned in one context no longer stays siloed to a single agent. Roughly one in four users on the platform has cross-agent memory entries stored in Pinecone.
  • Standalone query extraction. The most recent milestone replaced raw message embedding with standalone query extraction before retrieval. By separating "what the user said" from "what the agent needs to retrieve," retrieval precision significantly improved.

Responsibilities are cleanly split across the stack. Jenova's application layer handles agent orchestration and workflow automation. Model providers (OpenAI, Anthropic, Google, and open-source models) handle embedding generation and language model inference. Pinecone handles vector indexing and retrieval. AWS serves as the primary cloud infrastructure. The split keeps each layer doing what it does best without coupling concerns that should be independent.

For an agent platform, the quality of the knowledge layer determines whether users stay or leave. Pinecone is what lets us store everything a user has ever worked on and retrieve exactly the right piece of it in milliseconds. That's the foundation our entire product is built on. — Boris Wang, Founder, Jenova

result

Knowledge that compounds into a business advantage

Pinecone-backed retrieval has become the foundation of Jenova's product differentiation, retention economics, and growth.

Retrieval precision that compounds over time. Standalone query extraction before retrieval ensures queries hitting Pinecone are semantically clean. Combined with Pinecone's retrieval accuracy, this improved judged retrieval relevance by roughly 20–25 percentage points in internal evaluations. In the majority of multi-session interactions, agents now successfully reference relevant prior context without user prompting, an effect strongest for power users with 20+ sessions of history.

Dramatic token efficiency gains. Over half of all messages on the platform occur in sessions deep enough that full-context injection would be prohibitively expensive or impossible. The longest single session spans over 16 million tokens. Without vector-based retrieval, sessions at that scale simply could not function. Pinecone-backed semantic retrieval solves this by returning only the most relevant context per turn rather than the full history. For typical engaged sessions, this reduces token consumption by 70–80%. For deep sessions, the reduction reaches 85%. For the longest-running power sessions, it exceeds 95%. Those savings flow directly into gross margin.

Sub-10ms retrieval. Retrieval latency from Pinecone remains in the low single-digit milliseconds, effectively invisible inside a real-time agent interaction. That speed is critical for a platform where users interact with agents conversationally and expect immediate responses.

Scalable per-user isolation and unit economics. Pinecone's namespace architecture gives Jenova a muti-tenancy model with clean data isolation for each user within a single index. Every user's conversation history, uploaded documents, and cross-agent context lives in its own namespace, so queries only search that user's data. Because Pinecone charges per query based on namespace size, costs naturally align with user value: inactive users cost nothing, casual users cost little, and the most active power users cost more but are also the highest-paying customers. For a platform with 200,000+ signups generating user-specific knowledge across hundreds of agents, that multi-tenancy and pricing model is what keeps unit economics viable as the user base grows.

Organic growth driven by memory quality. Jenova has grown to over $1M ARR and 200,000 user signups across more than 70 countries, with nearly 100% of that being growth organic. Within the roleplay segment (~30% of all users and the platform's most popular agent), approximately 60% of new paying users come through word of mouth and community recommendation. Memory quality is the feature cited most often when users recommend Jenova. The number one reason users choose Jenova over dedicated roleplay AI products is the quality of its Pinecone-powered knowledge infrastructure.

High-value retention. Top power users concentrated in memory-intensive workflows like roleplay, creative writing, and long-running specialist research, spend $500–$2,000 per month. This includes businesses and organizations like the Central Bank of Nicaragua. The accumulated context from hundreds of sessions cannot be replicated on any competing platform. That accumulated knowledge is the definition of a switching cost.

Operational reliability. Zero Pinecone-related production incidents since launch. For a platform where memory reliability is part of the product promise, that track record matters.

Next up: multimodal retrieval and proactive agents

Jenova's roadmap deepens its investment in Pinecone-powered retrieval. Near-term, the team is extending retrieval to cover image content via multimodal embeddings, giving agents like the Manga Creator the ability to retrieve from a user's visual history semantically rather than structurally. After that, async and background agent capabilities will allow agents to retrieve and process context proactively. A stock analyst that surfaces relevant portfolio context before the user asks, or a creative agent that prepares continuity context before a session begins.

When Jenova opens its platform to external developers through its upcoming Managed Agent API, third-party agents will be able to build on the same stack that powers Jenova's first-party agents, with fully managed memory, RAG, and agent workflow orchestration included. The same Pinecone-powered infrastructure that drives Jenova's retrieval layer will be available to every agent on the platform, making persistent, retrieval-backed knowledge a platform-level capability rather than something each developer has to build from scratch.