AutoMLガイド

SynapCores AutoML ガイド

SQL内でPythonコードを書かずに強力な機械学習モデルを直接構築します.

概要

SynapCores AutoMLは、SQL構文を通じて機械学習実験を作成するための包括的なオプションを提供します。慣れ親しんだデータベースコマンドを使用して、本番環境に適したモデルをトレーニング、チューニング、デプロイします.

タスクの種類

タスクの種類	説明	デフォルトメトリクス
`regression`	連続値予測	R平方
`binary_classification`	二値分類	AUC
`classification`/`multiclass`	多クラス分類	精度
`clustering`	無教師分類	シルエットスコア
`anomaly`	異常検知	F1スコア
`time_series`	時系列予測	MAPE

AutoML実験の作成

基本構文

オプション1：AS構文

CREATE EXPERIMENT <experiment_name> AS
<SELECT_query>
WITH (<options>)

オプション2：USING構文

CREATE EXPERIMENT <experiment_name>
USING (<SELECT_query>)
TARGET <target_column>
OPTIONS (<options>)

設定オプション

一般オプション

オプション	タイプ	デフォルト	説明
`task_type`	文字列	`'binary_classification'`	MLタスクの種類
`target_column`	文字列	必須	予測する列
`max_trials`	整数	100	最大トレーニング試行回数
`time_budget_minutes`	整数	60	最大時間予算
`validation_split`	浮動小数点数	0.2	検証データの割合
`cv_folds`	整数	5	クロス検証の折り重ね
`optimization_metric`	文字列	タスク依存	最適化するメトリクス
`ensemble`	ブーリアン	真	アンサンブルモデルの作成
`early_stopping_patience`	整数	10	改善しない試行回数
`random_seed`	整数	42	再現性のためのランダムシード

利用可能なアルゴリズム

'linear_regression' - 線形回帰
'logistic_regression' - ロジスティック回帰
'decision_tree' - 決定木
'random_forest' - ランダムフォレスト
'gradient_boosting' - 勾配ブースティング
'xgboost' - XGBoost
'neural_network' - ニューラルネットワーク
'knn' - k-近傍法
'naive_bayes' - ナイーブベイズ
'svm' - サポートベクターマシン

アルゴリズム選択戦略

'all' - 利用可能な全てのアルゴリズムを試す
'fast' - 高速アルゴリズムのみ（線形モデル、決定木、ナイーブベイズ、kNN）
'accurate' - 高度な精度のアルゴリズムのみ（ランダムフォレスト、勾配ブースティング、XGBoost、ニューラルネットワーク）
'interpretable' - 解釈可能なアルゴリズムのみ（線形回帰、ロジスティック回帰、決定木）

アルゴリズム固有のオプション

ランダムフォレスト

ハイパーパラメータ	タイプ	デフォルト	説明
`n_estimators`	整数	100	木の数
`max_depth`	整数	なし	最大木の深さ
`min_samples_split`	整数	2	分割するための最小サンプル数
`max_features`	文字列/浮動小数点数	'sqrt'	考慮する特徴

WITH (
  task_type='classification',
  algorithms=['random_forest'],
  n_estimators=200,
  max_depth=10
)

ニューラルネットワーク

ハイパーパラメータ	タイプ	デフォルト	説明
`hidden_layers`	配列	[100]	隠れ層のサイズ
`learning_rate`	浮動小数点数	0.001	初期学習率
`batch_size`	整数	32	ミニバッチサイズ
`n_epochs`	整数	100	最大エポック数
`activation`	文字列	'relu'	活性化関数
`dropout_rate`	浮動小数点数	0.0	ドロップアウト率

WITH (
  task_type='classification',
  algorithms=['neural_network'],
  hidden_layers=[128, 64, 32],
  dropout_rate=0.2
)

グラデーションブースティング / XGBoost

ハイパーパラメータ	タイプ	デフォルト	説明
`n_estimators`	整数	100	ブースティングステージの数
`learning_rate`	浮動小数点数	0.1	学習率
`max_depth`	整数	3	木の最大深さ
`subsample`	float	1.0	サンプルの割合

特徴工学オプション

オプション	タイプ	デフォルト	説明
`auto_features`	boolean	true	特徴を自動生成
`polynomial_degree`	整数	2	多項式特徴量の次数
`interaction_features`	ブーリアン	false	交差特徴量の生成
`scaling`	文字列	'standard'	特徴量のスケーリング方法
`missing_values`	文字列	'mean'	欠損値の扱い
`categorical_encoding`	文字列	'onehot'	カテゴリカルエンコーディング方法

スケーリング方法

'standard' - 標準化（平均0、分散1）
'minmax' - 最小-最大スケーリングで[0, 1]
'robust' - 中位数とIQRを使用したロバストスケーリング
'none' - スケーリングなし

カテゴリカルエンコーディング

'onehot' - one-hotエンコーディング
'label' - ラベルエンコーディング
'target' - ターゲットエンコーディング
'ordinal' - 順位エンコーディング

完全な例

顧客離脱予測

CREATE EXPERIMENT churn_prediction AS
SELECT customer_id, age, tenure, monthly_charges, total_charges, churned
FROM customers
WITH (
  task_type='binary_classification',
  target_column='churned',
  max_trials=50,
  validation_split=0.2
);

家屋価格回帰

CREATE EXPERIMENT house_price_model AS
SELECT * FROM housing_data
WITH (
  task_type='regression',
  target_column='price',
  algorithms=['random_forest', 'xgboost', 'gradient_boosting'],
  max_trials=100,
  n_estimators=200
);

特徴エンジニアリングによる不正検知

CREATE EXPERIMENT fraud_detection AS
SELECT * FROM transactions
WITH (
  task_type='binary_classification',
  target_column='is_fraud',
  algorithms=['xgboost', 'neural_network'],
  auto_features=true,
  polynomial_degree=2,
  interaction_features=true,
  scaling='robust',
  categorical_encoding='target',
  max_trials=150
);

時系列予測

CREATE EXPERIMENT sales_forecast AS
SELECT date, product_id, sales, promotions, holidays
FROM sales_data
WITH (
  task_type='time_series',
  target_column='sales',
  algorithms=['gradient_boosting', 'neural_network'],
  cv_folds=5
);

説明可能なモデルによるコンプライアンス

CREATE EXPERIMENT loan_approval AS
SELECT * FROM loan_applications
WITH (
  task_type='binary_classification',
  target_column='approved',
  algorithms=['logistic_regression', 'decision_tree'],
  max_depth=5
);

モデルの操作

すべての実験を表示

SHOW MODELS;

モデルのデプロイ

DEPLOY MODEL best_model FROM EXPERIMENT churn_prediction
WITH (replicas=3, memory='2Gi');

予測を行う

PREDICT churn_probability, risk_score
USING churn_model
AS SELECT customer_id, age, tenure FROM new_customers;

モデルを説明

DESCRIBE MODEL churn_model;

モデルをドロップ

DROP MODEL old_model;

ベストプラクティス

パラメータチューニング：互換性があるすべての選択されたアルゴリズムにアルゴリズム固有のオプションが適用されます.
デフォルト値：すべてのオプションには適切なデフォルト値があります。デフォルト値と異なるオプションのみを指定してください。
リソース制限：実験はmax_trialsとtime_budget_minutesの両方を尊重します。いずれかの制限に達したときに停止します。
再現性：random_seedを設定して、実行ごとに一貫した結果を得ます。
アルゴリズムの互換性：各タスクタイプごとにシステムが互換性のないアルゴリズムを自動的にフィルタリングします。

文書バージョン：1.0
最終更新日：2025年12月
ウェブサイト：https://synapcores.com

初版公開元synapcores.com — SynapCoresは、無料の単一バイナリAIネイティブデータベース（ベクトル＋グラフ＋SQL＋LLM）です。

おすすめ購読元

DEV Community