慣性聚合 高效追讀感興趣之博客、新聞、科技資訊
閱原文 以慣性聚合開啟

推薦訂閱源

博客园 - 司徒正美
V
V2EX
T
Tailwind CSS Blog
有赞技术团队
有赞技术团队
aimingoo的专栏
aimingoo的专栏
Apple Machine Learning Research
Apple Machine Learning Research
IT之家
IT之家
Blog — PlanetScale
Blog — PlanetScale
A
About on SuperTechFans
月光博客
月光博客
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Martin Fowler
Martin Fowler
博客园 - 聂微东
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
GbyAI
GbyAI

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python)
AutoML指南
Luis M · 2026-05-24 · via DEV Community

Luis M

SynapCores AutoML 指南

直于 SQL 之中,无需 Python 之码,即可构建强大机器学习模型。

概览

SynapCores AutoML 提供通 SQL 之语法,以成机器学习之实验。用熟稔之数据库命令,训练、调优、部署生产可用之模型。

任务类型

任务类型 描述 默认指标
regression 连续值预测 R平方
binary_classification 二分类 AUC
classification/multiclass 多分类 准确率
clustering 无监督分组 轮廓分数
anomaly 异常检测 F1 之得分
time_series 時序預測 MAPE

製作 AutoML 試驗

基本句法

選項一:AS 句法

CREATE EXPERIMENT <experiment_name> AS
<SELECT_query>
WITH (<options>)

進入全屏模式 退出全屏模式

選項二:USING 句法

CREATE EXPERIMENT <experiment_name>
USING (<SELECT_query>)
TARGET <target_column>
OPTIONS (<options>)

進入全屏模式 退出全屏模式

配置选项

通用选项

选项 类型 默认值 描述
task_type 字符串 'binary_classification' 机器学习任务类型
target_column 字符串 必需 预测列
max_trials 整数 100 最大训练试验次数
time_budget_minutes 整数 60 最大时间预算
validation_split 浮点数 0.2 验证数据比例
cv_folds 整数 5 交叉验证折数
optimization_metric 字符串 任务相关 优化指标
ensemble 真值 诚然 创集众模
early_stopping_patience 整數 试炼无进益
random_seed 整數 四十二 随机之种以重演

可用算法

  • 'linear_regression'- 线性回归
  • 'logistic_regression'- 逻辑回归
  • 'decision_tree' - 决策树
  • 'random_forest' - 随机森林
  • 'gradient_boosting' - 梯度提升
  • 'xgboost' - XGBoost
  • 'neural_network' - 神经网络
  • 'knn' - K近邻
  • 'naive_bayes' - 朴素贝叶斯
  • 'svm' - 支持向量机

算法选择策略

  • 'all' - 尝试所有可用算法
  • 'fast' - 惟速算法(线性模型、决策树、朴素贝叶斯、knn)
  • 'accurate' - 惟精算法(随机森林、梯度提升、xgboost、神经网络)
  • 'interpretable' - 惟解算法(线性回归、逻辑回归、决策树)

算法专有选项

随机森林

超参数 类型 默认 描述
n_estimators 整数 100 树的数量
max_depth 整数 最大树深度
min_samples_split 整数 2 分割所需最小样本数
max_features 字符串/浮点数 'sqrt' 考虑的特征
WITH (
  task_type='classification',
  algorithms=['random_forest'],
  n_estimators=200,
  max_depth=10
)

入全景模式 出全屏模式

神經網絡

超参数 键入 默认 说明
hidden_layers 数列 [百] 隐层之大小
learning_rate 浮数 一毫之一十丝 初学之率
batch_size 整数 32 小批量大小
n_epochs 整数 100 最大轮次
activation 字符串 'relu' 激活函数
dropout_rate 浮点数 0.0 丢弃率
WITH (
  task_type='classification',
  algorithms=['neural_network'],
  hidden_layers=[128, 64, 32],
  dropout_rate=0.2
)

全屏模式 退出全屏模式

梯度提升 / XGBoost

超参数 类型 默认值 描述
n_estimators 整数 100 提升阶段数
learning_rate 浮点数 0.1 学习率
max_depth 整数 3 樹之極深
subsample 浮點數 1.0 樣本之比

特徵工程之選項

選項 類型 預設 說明
auto_features 布林值 自動生成特徵
polynomial_degree 整数 2 多项式特征度
interaction_features 布尔值 生成交互特征
scaling 字符串 '标准' 特征缩放方法
missing_values 字符串 '均值' 缺失值处理
categorical_encoding 字符串 'onehot' 类别编码之法

缩放之法

  • 'standard' - 标准化(均值为零,方差为一)
  • 'minmax' - 最小-最大缩放至[0, 1]
  • 'robust' - 基于中位数与四分位距的稳健缩放
  • 'none' - 无缩放

类别编码

  • 'onehot' - one-hot编码
  • 'label' - 标签编码
  • 'target' - 目标编码
  • 'ordinal' - 序数编码

完整示例

客户流失预测

CREATE EXPERIMENT churn_prediction AS
SELECT customer_id, age, tenure, monthly_charges, total_charges, churned
FROM customers
WITH (
  task_type='binary_classification',
  target_column='churned',
  max_trials=50,
  validation_split=0.2
);

进入全屏模式 退出全屏模式

房价回归

CREATE EXPERIMENT house_price_model AS
SELECT * FROM housing_data
WITH (
  task_type='regression',
  target_column='price',
  algorithms=['random_forest', 'xgboost', 'gradient_boosting'],
  max_trials=100,
  n_estimators=200
);

进入全屏模式 退出全屏模式

特征工程之欺诈检测

CREATE EXPERIMENT fraud_detection AS
SELECT * FROM transactions
WITH (
  task_type='binary_classification',
  target_column='is_fraud',
  algorithms=['xgboost', 'neural_network'],
  auto_features=true,
  polynomial_degree=2,
  interaction_features=true,
  scaling='robust',
  categorical_encoding='target',
  max_trials=150
);

入全景模式 出全景模式

时序预测

CREATE EXPERIMENT sales_forecast AS
SELECT date, product_id, sales, promotions, holidays
FROM sales_data
WITH (
  task_type='time_series',
  target_column='sales',
  algorithms=['gradient_boosting', 'neural_network'],
  cv_folds=5
);

入全景模式 出全景模式

合规可解模型

CREATE EXPERIMENT loan_approval AS
SELECT * FROM loan_applications
WITH (
  task_type='binary_classification',
  target_column='approved',
  algorithms=['logistic_regression', 'decision_tree'],
  max_depth=5
);

入全景模式 出全景模式

模型操作

展示所有实验

SHOW MODELS;

入全景模式 出全景模式

部署模型

DEPLOY MODEL best_model FROM EXPERIMENT churn_prediction
WITH (replicas=3, memory='2Gi');

入全景模式 出全景模式

做出预测

PREDICT churn_probability, risk_score
USING churn_model
AS SELECT customer_id, age, tenure FROM new_customers;

入全景模式 出全景模式

描述模型

DESCRIBE MODEL churn_model;

入全景模式 退出全屏模式

放置模型

DROP MODEL old_model;

进入全屏模式 退出全屏模式

最佳实践

  1. 参数调优:特定算法选项适用于所有选定算法,且兼容时适用.

  2. 默认值:诸项皆备合理之默认值。唯需指定与默认值相异之选项。

  3. 资源限制:实验既遵从max_trials亦遵从time_budget_minutes,遇任一限制即止。

  4. 可复性:设random_seed以使多次运行结果恒一。

  5. 算法兼容性:系统自动筛除与各任务类型不兼容之算法。


文档版本:1.0
最后更新:二二五年十二月
网站https://synapcores.com


初刊于synapcores.com — SynapCores乃一免费、单二进制之AI原生数据库(含向量、图谱、SQL及LLM)。