AutoML 指南

SynapCores AutoML 指南

直接透過 SQL 建立強大的機器學習模型，無需撰寫任何 Python 代码。

概覽

SynapCores AutoML 提供透過 SQL 語法創建機器學習實驗的全面選項。使用熟悉的資料庫命令來訓練、調優和部署即時生產模型。

執行任務類型

執行任務類型	描述	預設指標
`regression`	連續值預測	R平方
`binary_classification`	二元分類	AUC
`classification`/`multiclass`	多類別分類	準確率
`clustering`	無監督分群	輪廓分數
`anomaly`	異常檢測	F1 分數
`time_series`	時間序列預測	MAPE

創建 AutoML 實驗

基本語法

選項 1: AS 語法

CREATE EXPERIMENT <experiment_name> AS
<SELECT_query>
WITH (<options>)

選項 2: USING 語法

CREATE EXPERIMENT <experiment_name>
USING (<SELECT_query>)
TARGET <target_column>
OPTIONS (<options>)

設定選項

常規選項

選項	類型	預設值	說明
`task_type`	字串	`'binary_classification'`	機器學習任務類型
`target_column`	字串	必需	預測欄位
`max_trials`	整數	100	最大訓練試驗次數
`time_budget_minutes`	整數	60	最大時間預算
`validation_split`	浮點數	0.2	驗證數據比例
`cv_folds`	整數	5	交叉驗證折疊數
`optimization_metric`	字串	任務相關	優化指標
`ensemble`	布林值	true	建立集成模型
`early_stopping_patience`	整數	10	沒有進步的試驗次數
`random_seed`	整數	42	為了可重現性而設置的隨機種子

可用的演算法

'linear_regression' - 線性迴歸
'logistic_regression' - 電子分類迴歸
'decision_tree' - 決策樹
'random_forest' - 隨機森林
'gradient_boosting' - 梯度提升
'xgboost' - XGBoost
'neural_network' - 神經網絡
'knn' - K-最近鄰居
'naive_bayes' - 偏見貝氏
'svm' - 支持向量機

算法選擇策略

'all' - 試用所有可用算法
'fast' - 僅限快速演算法（線性模型、決策樹、朴素貝葉斯、knn）
'accurate' - 僅限高精度演算法（隨機森林、梯度提升、xgboost、神經網路）
'interpretable' - 僅限可解釋演算法（線性迴歸、邏輯斯迴歸、決策樹）

演算法特定選項

隨機森林

超參數	類型	預設	描述
`n_estimators`	整數	100	樹木數量
`max_depth`	整數	無	最大樹深度
`min_samples_split`	整數	2	分割所需最小樣本數
`max_features`	字串/浮點數	'sqrt'	考慮的特徵

WITH (
  task_type='classification',
  algorithms=['random_forest'],
  n_estimators=200,
  max_depth=10
)

�神經網絡

超參數	類型	預設值	說明
`hidden_layers`	array	[100]	隱藏層大小
`learning_rate`	float	0.001	初始學習率
`batch_size`	整數	32	小型批次大小
`n_epochs`	整數	100	最大迭代次數
`activation`	字串	'relu'	激勵函數
`dropout_rate`	浮點數	0.0	丟失率

WITH (
  task_type='classification',
  algorithms=['neural_network'],
  hidden_layers=[128, 64, 32],
  dropout_rate=0.2
)

增量提升 / XGBoost

超參數	類型	預設值	說明
`n_estimators`	整數	100	提升階段數
`learning_rate`	浮點數	0.1	學習率
`max_depth`	整數	3	最大樹深度
`subsample`	浮點數	1.0	樣本比例

特徵工程選項

選項	類型	預設值	說明
`auto_features`	布林值	true	自動產生特徵
`polynomial_degree`	整數	2	多項式特徵次數
`interaction_features`	布林值	錯誤	產生交互相關特徵
`scaling`	字串	'standard'	特徵縮放方法
`missing_values`	字串	'平均'	遺失值處理
`categorical_encoding`	字串	'onehot'	分類編碼方法

標準化方法

'standard' - 標準化（零均值，單位變異數）
'minmax' - 最小-最大標準化至[0, 1]
'robust' - 使用中位數和四分位距的強健標準化
'none' - 不進行標準化

分類編碼

'onehot' - one-hot編碼
'label' - 標籤編碼
'target' - 目標編碼
'ordinal' - 序列編碼

完整範例

客戶流失預測

CREATE EXPERIMENT churn_prediction AS
SELECT customer_id, age, tenure, monthly_charges, total_charges, churned
FROM customers
WITH (
  task_type='binary_classification',
  target_column='churned',
  max_trials=50,
  validation_split=0.2
);

房價回歸

CREATE EXPERIMENT house_price_model AS
SELECT * FROM housing_data
WITH (
  task_type='regression',
  target_column='price',
  algorithms=['random_forest', 'xgboost', 'gradient_boosting'],
  max_trials=100,
  n_estimators=200
);

特徵工程中的欺騙檢測

CREATE EXPERIMENT fraud_detection AS
SELECT * FROM transactions
WITH (
  task_type='binary_classification',
  target_column='is_fraud',
  algorithms=['xgboost', 'neural_network'],
  auto_features=true,
  polynomial_degree=2,
  interaction_features=true,
  scaling='robust',
  categorical_encoding='target',
  max_trials=150
);

時間序列預測

CREATE EXPERIMENT sales_forecast AS
SELECT date, product_id, sales, promotions, holidays
FROM sales_data
WITH (
  task_type='time_series',
  target_column='sales',
  algorithms=['gradient_boosting', 'neural_network'],
  cv_folds=5
);

可解釋模型以符合規範

CREATE EXPERIMENT loan_approval AS
SELECT * FROM loan_applications
WITH (
  task_type='binary_classification',
  target_column='approved',
  algorithms=['logistic_regression', 'decision_tree'],
  max_depth=5
);

模型操作

顯示所有實驗

SHOW MODELS;

部署模型

DEPLOY MODEL best_model FROM EXPERIMENT churn_prediction
WITH (replicas=3, memory='2Gi');

進行預測

PREDICT churn_probability, risk_score
USING churn_model
AS SELECT customer_id, age, tenure FROM new_customers;

描述模型

DESCRIBE MODEL churn_model;

放置模型

DROP MODEL old_model;

最佳實踐

參數調整：針對特定演算法的選項適用於所有選定的演算法，只要兼容即可.
預設值：所有選項都有合理的預設值。僅指定與預設值不同的選項。
資源限制：實驗同時尊重max_trials和time_budget_minutes，當任一限制達到時停止。
可重現性：設置random_seed以確保跨次運行結果的一致性。
算法兼容性: 系統會自動篩選不相容的演算法，針對每個任務類型。

文件版本: 1.0
最後更新: 2025年12月
網站: https://synapcores.com

原發布於 synapcores.com — SynapCores 是一個免費、單二進制原生 AI 數據庫（向量 + 圖形 + SQL + 大語言模型）。

推薦訂閱源

DEV Community