慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
以 Python 与 Snowflake Cortex 构建可用之语义检索
Artem · 2026-05-24 · via DEV Community

吾近日承命,欲于吾之目录中施以人工智能驱动的语义检索。吾等既已用Snowflake,遂决意以Cortex Search Service为此功能之实现。

若君未识Cortex Search为何物,可速查此链接,以览其概要。者,Cortex Search之要义,使汝于Snowflake所储之数据,直建低时延之语义及全文检索也。

今文之中,吾将分吾之经验,以Cortex Search Service融入吾之Python后端应用,兼及吾辈所遇之险阻。


搜索之列,宜专一

SEARCH_TEXT之列,或更确言之,Cortex Search之“检索列”,乃Cortex Search索引并用于检索之列也。

吾初时之误,在于欲将几乎每一可用字段皆置入SEARCH_TEXT列。

其理初看似合理:Cortex Search所见字段愈多,其得之语境愈丰。然实则,此可令可检索之文辞纷杂。

譬如,如ID、状态标识、租户/公司ID、货币ID及内部分类ID等字段,通常于语义搜索无益。

'id:' || COALESCE(id::STRING, ''),
'warehouse_id:' || COALESCE(warehouse_id::STRING, ''),
'project_id:' || COALESCE(project_id::STRING, ''),
'is_active:' || COALESCE(is_active::STRING, ''),
'status:' || COALESCE(status, ''),
'currency_id:' || COALESCE(currency_id::STRING, '')

入全景模式 出全屏模式

此等字段,非用户所常索也。当显之如ATTRIBUTES乃用以筛滤,而非他用。后节当详述之。

SEARCH_TEXT其域当专注描述物之实义之域。

CONCAT_WS(
    ' ',
    title,
    brand,
    origin,
    category_name,
    subcategory_name,
    short_description
) AS SEARCH_TEXT

入全景模式 出全屏模式

吾辈当纳诸字段,以应终用户之索求。


揭可筛选之域ATTRIBUTES

Cortex Search Service之配置,尚有一细小而重要之节,乃ATTRIBUTES物之属。

未完之句,难译其意。ON列乃可检索之文。此乃 Cortex Search 匹配用户询查所依。然若需以元数据滤结果,如组织、状态、类目、品牌、地域或可用性,则此列须于服务创制时加于属性。

属性非主搜之文。乃与搜得之果并现之栏,可资筛选或示之。ATTRIBUTES之栏,乃于CREATE CORTEX SEARCH SERVICE之命中所指定者,Cortex Search之滤制适施于斯。

CREATE OR REPLACE CORTEX SEARCH SERVICE DEMO_DB.SEARCH.PRODUCT_SEARCH_SERVICE ON SEARCH_TEXT 
ATTRIBUTES (
  CITY,
  COUNTRY,
  CURRENCY,
  IS_ACTIVE
)
WAREHOUSE = WH_SEARCH_DEMO
TARGET_LAG = '1 hour'
AS
SELECT
    ID,
    CITY,
    COUNTRY,
    CURRENCY,
    IS_ACTIVE
FROM DEMO_DB.SEARCH.PRODUCT_SEARCH_SOURCE;

入全景模式 出全景模式

所当谨记者:凡欲后时筛选之列,必为属性可得。且Snowflake亦言,列在ATTRIBUTES必含于创制此服务之原查询中。

则吾辈可施此诸滤于Python之码也。

from typing import Any, Dict, List

COUNTRY_ATTRIBUTE = "COUNTRY"
CITY_ATTRIBUTE = "CITY"
IS_ACTIVE_ATTRIBUTE = "IS_ACTIVE"

filters: List[Dict[str, Any]] = [
    {
        "@or": [
            {
                "@eq": {
                    COUNTRY_ATTRIBUTE: "UK",
                }
            },
            {
                "@eq": {
                    CITY_ATTRIBUTE: "London",
                }
            },
        ],
    },
    {
        "@eq": {
            IS_ACTIVE_ATTRIBUTE: True,
        }
    },
]

response = search_service.search(
    query=query,
    columns=[
        "ID",
        "COUNTRY",
        "CITY",
        "IS_ACTIVE",
    ],
    filter={
        "@and": filters,
    },
    limit=20,
)

入全景模式 出全屏模式

欲使滤器精简,易明其理。善之过滤器,当于Cortex Search排定次第、呈还结果之前,减缩其索寻之域。若滤器之负载过巨,或层叠过深,则或可示明,宜调适其索寻之源表,而非将过多之应用逻辑移入索寻之问也。


须留意TARGET_LAG

尚有一要义之论辩于创制或更替 Cortex 搜索之务者,乃TARGET_LAG.

简言之,TARGET_LAG主司 Cortex 搜索之索检,较之原表,其新也若何.

譬如:

TARGET_LAG = '10 minutes'

此非谓新行立即可搜。Cortex Search犹需更其内索引。故尔,于源表增或改行,其变必待再更后,方现于搜获之果。

此尤要者,若尔用管理嵌入。Cortex Search必先处理更新之源数据,造或更新嵌入,复新索引,而后用户乃可通过语义搜索得见是录。

故若TARGET_LAG过久,尔之搜索结果可成陈旧。新出之物,虽已存于尔之源表,而用户犹不能即时于搜索中得见也。

与此同时,TARGET_LAG之设亦不宜过卑。频数更新,则雪花之劳增,而资费随之耗。雪花复言,若目标迟滞过微,则索引之更易,非所必需。

故适值之求,在汝索果之需何如耳。

CREATE OR REPLACE CORTEX SEARCH SERVICE DEMO_DB.SEARCH.PRODUCT_SEARCH_SERVICE
ON SEARCH_TEXT
ATTRIBUTES
(
    ITEM_ID,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
)
WAREHOUSE = WH_SEARCH_DEMO
TARGET_LAG = '30 minutes'
AS
SELECT
    ITEM_ID,
    SEARCH_TEXT,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
FROM DEMO_DB.SEARCH.PRODUCT_SEARCH_SOURCE;

入全景模式 退出全屏模式

若为面向用户之目录检索,新刊之物宜速现,十五二十时辰或可适宜.

若为内府知识库、文牍检索,或数据常无更易,一辰乃至更久,亦无不可.


择适于用之嵌入模型

Cortex Search 乃于向量检索之际,用嵌入模型。简言之,此模型将汝之检索列与用户查询化而为向量,使 Cortex Search 得寻语义相近之记录,非惟含同关键词之记录。Snowflake 允许于创建 Cortex Search 之服务时,以 EMBEDDING_MODEL 之参数择模型。

CREATE OR REPLACE CORTEX SEARCH SERVICE DEMO_DB.SEARCH.PRODUCT_SEARCH_SERVICE
ON SEARCH_TEXT
ATTRIBUTES
(
    ITEM_ID,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
)
WAREHOUSE = WH_SEARCH_DEMO
TARGET_LAG = '30 minutes'
EMBEDDING_MODEL = 'snowflake-arctic-embed-m-v1.5' # custom EMBEDDING_MODEL
AS
SELECT
    ITEM_ID,
    SEARCH_TEXT,
    STATUS,
    IS_ACTIVE,
    ACCOUNT_ID
FROM DEMO_DB.SEARCH.PRODUCT_SEARCH_SOURCE;

退出全屏模式

雪花列表snowflake-arctic-embed-m-v1.5为默认的Cortex搜索嵌入模型。其输出维度为768,512词的上下文窗口,仅支持英语。雪花亦称其为现有Cortex搜索模型中索引速度最快的选项。此乃英语专一目录或内部搜索之优选,盖索引速度为要也。

若汝之目录多语种,则唯英语之默认模式或非汝所宜。是故,当检视多语种之模型,如snowflake-arctic-embed-l-v2.0snowflake-arctic-embed-l-v2.0-8kvoyage-multilingual-2欲知所支持之模型、尺寸、上下文窗口及语言支持之全列,请参阅。官制雪花嵌入模型表.

雪花之CREATE CORTEX SEARCH SERVICE 文档显 EMBEDDING_MODEL 为服务之定义所载,更易其则需重造其服务,非徒调其运行之参数而已。故当早试之,尤若尔产品具多语之数据,或用户以异语求索之时。


Cortex 之连接预热

吾等整合Cortex Search之主因,乃为减损检索迟滞,盖吾辈之物甚巨。

首困,源于接续之理。吾用Snowflake Python之庖厨。,且每有新索,必开新络。此络之设,约需一时一瞬至一时半,而实索之问,仅半时耳。故未及优化索本,先致力于去此络设之费于索径。

此优化乃将 Snowflake 之连接设置移出请求路径。吾增一 warmup() 之法,一解 Cortex Search 之服务,并缓存 Snowflake 之连接、Root 对象及服务引用于工作进程之层。此缓存受锁保护,故初始化虽多请求同时至,亦安无虞。

    def warmup(self) -> None:
        self._get_service()

    def _get_service(self):
        cache_key = (self.database, self.schema, self.service_name)
        service = self.__class__._service_cache.get(cache_key)
        if service:
            return service

        with self.__class__._lock:
            service = self.__class__._service_cache.get(cache_key)
            if service:
                return service

            root = self._get_root()
            service = (
                root.databases[self.database]
                .schemas[self.schema]
                .cortex_search_services[self.service_name]
            )
            self.__class__._service_cache[cache_key] = service
            return service

    def _get_root(self):
        root = self.__class__._root
        connection = self.__class__._connection
        if root and connection and not connection.is_closed():
            return root

        with self.__class__._lock:
            root = self.__class__._root
            connection = self.__class__._connection
            if root and connection and not connection.is_closed():
                return root

            if connection and not connection.is_closed():
                try:
                    connection.close()
                except Exception:
                    pass

            self.__class__._connection = None
            self.__class__._root = None
            self.__class__._service_cache = {}

            try:
                connection = snowflake.connector.connect(**self._connection_parameters)
            except Exception as exc:
                raise CortexSearchCatalogServiceError(
                    f"Failed to create Snowflake connection for Cortex search: {exc}"
                )

            self.__class__._connection = connection
            root = Root(connection)
            self.__class__._root = root
            return root

Enter fullscreen mode Exit fullscreen mode

吾乃呼此热身之法,出诸Gunicorn之post_fork钩。盖Gunicorn之工作者,皆为别离之进程,每工作者,必于分叉后自创Snowflake之连接。因是变易,连接乃于工作者启动时开启,非于首度搜索请求时,遂去用户可见路径上之一至一秒半之连接设置之费。

def post_fork(server, worker):
    # Warm the Snowflake Cortex client in each worker process so the first user
    # search does not pay the connection/session setup cost on the request path.

    try:
        if not _cortex_warmup_is_configured():
            server.log.debug(
                "Skipping Cortex warmup for worker pid=%s because Snowflake settings are incomplete.",
                worker.pid,
            )
            return

        from apps.db.cortex_search_services.cortex_search_catalog_service import CortexSearchCatalogService

        CortexSearchCatalogService().warmup()
        server.log.info("Cortex warmup completed for worker pid=%s", worker.pid)
    except Exception:
        server.log.exception("Cortex warmup failed for worker pid=%s", worker.pid)

入全景模式 出全屏模式


调校Cortex搜索评分之重

次要之优化,非在延迟,而在结果之质。

Cortex Search 乃混合排序之术。其能合语义/向量相似、关键词/文本匹配、神经重排序。Snowflake 借 scoring_config.weights 之途以显此术,其中向量、文本、重排序器各司其职,控其得分之轻重。此三者之权,初设均等,然可依查询之需,逐案调之。

譬如吾辈之例,纯以关键词匹配,有时竟使非当之项位高,只因含相合之字。有实例焉,一“手套防护”之标,竟列于真手套之上。文辞相合甚笃,然义理非用户所期之品也。

为解此,吾增向量分之重于文分也。

# Tune these weights to adjust Cortex ranking before results reach application code.
# vectors = semantic similarity, texts = keyword match, reranker = neural reranker.
# Raise vectors relative to texts to prevent keyword-heavy but semantically irrelevant
# items from outranking semantically correct ones.
DEFAULT_SCORING_CONFIG = {
    "weights": {
        "vectors": 2,
        "texts": 1,
        "reranker": 1,
    }
}

入全景模式 退出全屏模式

此乃要义,盖因之直变Cortex Search于达吾应用码前之序。若吾搜索近于传统全文之索,或欲增文本之重。若用户索于意、同义、描述、或自然语言,增向量之重,可致佳果。

复有参数可试者,曰重排也。Cortex Search 之用,本以义理重排以增相关,然重排亦能增查询迟滞。Snowflake 允许禁用重排,若迟滞较之增质更为紧要也。


但取 Cortex Search 所需之数据而已。

欲将 Cortex Search 联入生产之码,须知一实之限,即所负之大小也。

Snowflake 文档应答 Cortex Search 之询,其大小有度:REST API 与 Python API 之应答所负,不可逾十兆字节。

是故,于询之两端,皆宜慎之:所送于 Cortex Search 者与所求其返者,皆当留意。

滤器之用,当唯送所需。巨嵌之法,乃致默虫于产之捷途。若君有广OR若境遇相仿,宜择精简之术,若可能。譬如,非必构长句也。or列:

filter={
    "@or": [
        {"@eq": {"STATUS": "published"}},
        {"@eq": {"STATUS": "scheduled"}},
        {"@eq": {"STATUS": "archived"}},
    ]
}

入全景模式 出全屏模式

尝试使用使用in若合尔之境。

filter={
    "@in": {
        "STATUS": ["published", "scheduled", "archived"]
    }
}

入全景模式 出全屏模式

同理,尔之数据模型亦然。若每项检索请求皆需繁复之筛选逻辑,或可证其检索面之模型未臻完善。时当备一洁净之检索源表或视图,其字段易于筛选,而非将过多之应用逻辑强推于Cortex Search请求。

应答之事,吾愿 Cortex 搜索之结果列简。于多数应用搜索之流程,Cortex 搜索无须返全对象之负载。但返其内 ID,或仅需于排名或调试之轻量字段耳。

response = search_service.search(
    query=query,
    columns=[
        "ID",
    ],
    filter=filter,
    limit=50,
)

全屏模式入 全屏模式出

是故,应用可取此诸ID,自主应用之数据库,索其全录:

item_ids = [row["ID"] for row in response.results]

items = (
    CatalogItem.objects
    .filter(id__in=item_ids)
    .select_related("brand", "category")
    .prefetch_related("tags")
)

全屏模式入 全屏模式出

如是,则Cortex Search专其所长,惟索相关之录。汝之应用数据库,犹司载诸域之象,且省询索之费。

结论

Cortex Search,若已有数据于Snowflake之中,欲求低延迟之语义与全文检索,而不必另起炉灶,构建搜索管道,实为得力之选也。

然其要义在于,非可一劳永逸。其质与效,多系于服务之配置,及后端应用之运用。

吾之经验所悟,大抵有此数端:

  • 初启Snowflake之连,俟首用者之请;
  • 使索列专注于众实索之域;
  • 择嵌入之模,依数据与言文之需;
  • 调分值之重,视用例需义合或字配之宜;
  • 若需滤,则增所求之域于ATTRIBUTES。
  • 配置TARGET_LAG,视搜索结果所需之新鮮度而定;
  • 惟取Cortex Search所需之数据,并自主要应用数据库载入全对象;

要之,Cortex Search于应用级搜索实为良器,尤宜目录搜索、文档搜索,及其他已存于Snowflake之数据。