AutoML Guide

SynapCores AutoML Guide

Build powerful machine learning models directly in SQL without writing any Python code.

Overview

SynapCores AutoML provides comprehensive options for creating machine learning experiments through SQL syntax. Train, tune, and deploy production-ready models using familiar database commands.

Task Types

Task Type	Description	Default Metric
`regression`	Continuous value prediction	R-squared
`binary_classification`	Two-class classification	AUC
`classification`/`multiclass`	Multi-class classification	Accuracy
`clustering`	Unsupervised grouping	Silhouette Score
`anomaly`	Anomaly detection	F1 Score
`time_series`	Time series forecasting	MAPE

Creating AutoML Experiments

Basic Syntax

Option 1: AS Syntax

CREATE EXPERIMENT <experiment_name> AS
<SELECT_query>
WITH (<options>)

Option 2: USING Syntax

CREATE EXPERIMENT <experiment_name>
USING (<SELECT_query>)
TARGET <target_column>
OPTIONS (<options>)

Configuration Options

General Options

Option	Type	Default	Description
`task_type`	string	`'binary_classification'`	Type of ML task
`target_column`	string	Required	Column to predict
`max_trials`	integer	100	Maximum training trials
`time_budget_minutes`	integer	60	Maximum time budget
`validation_split`	float	0.2	Validation data proportion
`cv_folds`	integer	5	Cross-validation folds
`optimization_metric`	string	Task-dependent	Metric to optimize
`ensemble`	boolean	true	Create ensemble models
`early_stopping_patience`	integer	10	Trials without improvement
`random_seed`	integer	42	Random seed for reproducibility

Available Algorithms

'linear_regression' - Linear Regression
'logistic_regression' - Logistic Regression
'decision_tree' - Decision Tree
'random_forest' - Random Forest
'gradient_boosting' - Gradient Boosting
'xgboost' - XGBoost
'neural_network' - Neural Network
'knn' - K-Nearest Neighbors
'naive_bayes' - Naive Bayes
'svm' - Support Vector Machine

Algorithm Selection Strategies

'all' - Try all available algorithms
'fast' - Only fast algorithms (linear models, decision trees, naive bayes, knn)
'accurate' - Only highly accurate algorithms (random forest, gradient boosting, xgboost, neural networks)
'interpretable' - Only interpretable algorithms (linear regression, logistic regression, decision trees)

Algorithm-Specific Options

Random Forest

Hyperparameter	Type	Default	Description
`n_estimators`	integer	100	Number of trees
`max_depth`	integer	None	Maximum tree depth
`min_samples_split`	integer	2	Minimum samples to split
`max_features`	string/float	'sqrt'	Features to consider

WITH (
  task_type='classification',
  algorithms=['random_forest'],
  n_estimators=200,
  max_depth=10
)

Neural Network

Hyperparameter	Type	Default	Description
`hidden_layers`	array	[100]	Hidden layer sizes
`learning_rate`	float	0.001	Initial learning rate
`batch_size`	integer	32	Mini-batch size
`n_epochs`	integer	100	Maximum epochs
`activation`	string	'relu'	Activation function
`dropout_rate`	float	0.0	Dropout rate

WITH (
  task_type='classification',
  algorithms=['neural_network'],
  hidden_layers=[128, 64, 32],
  dropout_rate=0.2
)

Gradient Boosting / XGBoost

Hyperparameter	Type	Default	Description
`n_estimators`	integer	100	Number of boosting stages
`learning_rate`	float	0.1	Learning rate
`max_depth`	integer	3	Maximum tree depth
`subsample`	float	1.0	Fraction of samples

Feature Engineering Options

Option	Type	Default	Description
`auto_features`	boolean	true	Auto-generate features
`polynomial_degree`	integer	2	Polynomial feature degree
`interaction_features`	boolean	false	Generate interaction features
`scaling`	string	'standard'	Feature scaling method
`missing_values`	string	'mean'	Missing value handling
`categorical_encoding`	string	'onehot'	Categorical encoding method

Scaling Methods

'standard' - Standardization (zero mean, unit variance)
'minmax' - Min-Max scaling to [0, 1]
'robust' - Robust scaling using median and IQR
'none' - No scaling

Categorical Encoding

'onehot' - One-hot encoding
'label' - Label encoding
'target' - Target encoding
'ordinal' - Ordinal encoding

Complete Examples

Customer Churn Prediction

CREATE EXPERIMENT churn_prediction AS
SELECT customer_id, age, tenure, monthly_charges, total_charges, churned
FROM customers
WITH (
  task_type='binary_classification',
  target_column='churned',
  max_trials=50,
  validation_split=0.2
);

House Price Regression

CREATE EXPERIMENT house_price_model AS
SELECT * FROM housing_data
WITH (
  task_type='regression',
  target_column='price',
  algorithms=['random_forest', 'xgboost', 'gradient_boosting'],
  max_trials=100,
  n_estimators=200
);

Fraud Detection with Feature Engineering

CREATE EXPERIMENT fraud_detection AS
SELECT * FROM transactions
WITH (
  task_type='binary_classification',
  target_column='is_fraud',
  algorithms=['xgboost', 'neural_network'],
  auto_features=true,
  polynomial_degree=2,
  interaction_features=true,
  scaling='robust',
  categorical_encoding='target',
  max_trials=150
);

Time Series Forecasting

CREATE EXPERIMENT sales_forecast AS
SELECT date, product_id, sales, promotions, holidays
FROM sales_data
WITH (
  task_type='time_series',
  target_column='sales',
  algorithms=['gradient_boosting', 'neural_network'],
  cv_folds=5
);

Interpretable Model for Compliance

CREATE EXPERIMENT loan_approval AS
SELECT * FROM loan_applications
WITH (
  task_type='binary_classification',
  target_column='approved',
  algorithms=['logistic_regression', 'decision_tree'],
  max_depth=5
);

Model Operations

Show All Experiments

SHOW MODELS;

Deploy a Model

DEPLOY MODEL best_model FROM EXPERIMENT churn_prediction
WITH (replicas=3, memory='2Gi');

Make Predictions

PREDICT churn_probability, risk_score
USING churn_model
AS SELECT customer_id, age, tenure FROM new_customers;

Describe a Model

DESCRIBE MODEL churn_model;

Drop a Model

DROP MODEL old_model;

Best Practices

Parameter Tuning: Algorithm-specific options apply to all selected algorithms where compatible.
Default Values: All options have sensible defaults. Only specify options that differ from defaults.
Resource Limits: Experiments respect both max_trials and time_budget_minutes. Stops when either limit is reached.
Reproducibility: Set random_seed for consistent results across runs.
Algorithm Compatibility: The system automatically filters incompatible algorithms for each task type.

Document Version: 1.0
Last Updated: December 2025
Website: https://synapcores.com

Originally published at synapcores.com — SynapCores is a free, single-binary AI-native database (vector + graph + SQL + LLM).

推荐订阅源

DEV Community