AutoML 가이드

SynapCores AutoML 가이드

SQL에서 파이썬 코드를 작성하지 않고 강력한 머신러닝 모델을 직접 만드세요.

개요

SynapCores AutoML은 SQL 구문을 통해 머신러닝 실험을 만드는 광범위한 옵션을 제공합니다. 익숙한 데이터베이스 명령어를 사용하여 훈련, 튜닝 및 생산 준비 모델을 배포합니다.

작업 유형

작업 유형	설명	기본 지표
`regression`	연속 값 예측	R-squared
`binary_classification`	두 클래스 분류	AUC
`classification`/`multiclass`	다 클래스 분류	정확도
`clustering`	비지도 군집화	Silhouette 점수
`anomaly`	이상치 탐지	F1 스코어
`time_series`	시계열 예측	MAPE

AutoML 실험 만들기

기본 문법

옵션 1: AS 문법

CREATE EXPERIMENT <experiment_name> AS
<SELECT_query>
WITH (<options>)

옵션 2: USING 문법

CREATE EXPERIMENT <experiment_name>
USING (<SELECT_query>)
TARGET <target_column>
OPTIONS (<options>)

구성 옵션

일반 옵션

옵션	유형	기본값	설명
`task_type`	문자열	`'binary_classification'`	ML 작업 유형
`target_column`	문자열	필수	예측할 열
`max_trials`	정수	100	최대 훈련 시도 횟수
`time_budget_minutes`	정수	60	최대 시간 예산
`validation_split`	실수	0.2	검증 데이터 비율
`cv_folds`	정수	5	교차 검증 폴드
`optimization_metric`	문자열	작업에 의존	최적화할 지표
`ensemble`	boolean	true	앙상블 모델 생성
`early_stopping_patience`	integer	10	개선 없는 시도
`random_seed`	integer	42	재현성을 위한 랜덤 시드

사용 가능한 알고리즘

'linear_regression' - 선형 회귀
'logistic_regression' - 로지스틱 회귀
'decision_tree' - 결정 트리
'random_forest' - 랜덤 포레스트
'gradient_boosting' - 그래디언트 부스팅
'xgboost' - XGBoost
'neural_network' - 신경망
'knn' - K-최근접 이웃
'naive_bayes' - 나이브 베이즈
'svm' - 서포트 벡터 머신

알고리즘 선택 전략

'all' - 모든 사용 가능한 알고리즘을 시도해보세요
'fast' - 빠른 알고리즘만 (선형 모델, 결정 트리, 나이브 베이즈, KNN)
'accurate' - 매우 정확한 알고리즘만 (랜덤 포레스트, 그래디언트 부스팅, XGBoost, 신경망)
'interpretable' - 설명 가능한 알고리즘만 (선형 회귀, 로지스틱 회귀, 결정 트리)

알고리즘 특정 옵션

랜덤 포레스트

하이퍼파라미터	타입	기본값	설명
`n_estimators`	정수	100	트리의 수
`max_depth`	정수	없음	최대 트리 깊이
`min_samples_split`	정수	2	분할에 필요한 최소 샘플
`max_features`	문자열/실수	'sqrt'	고려할 특성

WITH (
  task_type='classification',
  algorithms=['random_forest'],
  n_estimators=200,
  max_depth=10
)

신경망

하이퍼파라미터	유형	기본값	설명
`hidden_layers`	배열	[100]	은닉층 크기
`learning_rate`	실수	0.001	초기 학습률
`batch_size`	정수	32	미니 배치 크기
`n_epochs`	정수	100	최대 에포크
`activation`	문자열	'relu'	활성화 함수
`dropout_rate`	실수	0.0	드롭아웃 비율

WITH (
  task_type='classification',
  algorithms=['neural_network'],
  hidden_layers=[128, 64, 32],
  dropout_rate=0.2
)

그래디언트 부스팅 / XGBoost

하이퍼파라미터	타입	기본값	설명
`n_estimators`	정수	100	부스팅 단계 수
`learning_rate`	실수	0.1	학습률
`max_depth`	정수	3	최대 트리 깊이
`subsample`	float	1.0	샘플의 비율

특성 엔지니어링 옵션

옵션	타입	기본값	설명
`auto_features`	boolean	true	특성 자동 생성
`polynomial_degree`	정수	2	다항식 특성 차수
`interaction_features`	불리언	false	상호작용 특성 생성
`scaling`	문자열	'standard'	특성 정규화 방법
`missing_values`	문자열	'평균'	결측값 처리
`categorical_encoding`	문자열	'onehot'	카테고리カル 인코딩 방법

스케일링 방법

'standard' - 표준화 (평균 0, 분산 1)
'minmax' - 최소-최대 스케일링 [0, 1]으로
'robust' - 중앙값과 IQR을 사용한 강건한 스케일링
'none' - 스케일링 없음

카테고리カル 인코딩

'onehot' - 원핫 인코딩
'label' - 레이블 인코딩
'target' - 타겟 인코딩
'ordinal' - 순서 인코딩

완전 예제

고객 이탈 예측

CREATE EXPERIMENT churn_prediction AS
SELECT customer_id, age, tenure, monthly_charges, total_charges, churned
FROM customers
WITH (
  task_type='binary_classification',
  target_column='churned',
  max_trials=50,
  validation_split=0.2
);

주택 가격 회귀

CREATE EXPERIMENT house_price_model AS
SELECT * FROM housing_data
WITH (
  task_type='regression',
  target_column='price',
  algorithms=['random_forest', 'xgboost', 'gradient_boosting'],
  max_trials=100,
  n_estimators=200
);

피처 엔지니어링을 이용한 사기 탐지

CREATE EXPERIMENT fraud_detection AS
SELECT * FROM transactions
WITH (
  task_type='binary_classification',
  target_column='is_fraud',
  algorithms=['xgboost', 'neural_network'],
  auto_features=true,
  polynomial_degree=2,
  interaction_features=true,
  scaling='robust',
  categorical_encoding='target',
  max_trials=150
);

시계열 예측

CREATE EXPERIMENT sales_forecast AS
SELECT date, product_id, sales, promotions, holidays
FROM sales_data
WITH (
  task_type='time_series',
  target_column='sales',
  algorithms=['gradient_boosting', 'neural_network'],
  cv_folds=5
);

준수를 위한 해석 가능한 모델

CREATE EXPERIMENT loan_approval AS
SELECT * FROM loan_applications
WITH (
  task_type='binary_classification',
  target_column='approved',
  algorithms=['logistic_regression', 'decision_tree'],
  max_depth=5
);

모델 작업

모든 실험 보기

SHOW MODELS;

모델 배포

DEPLOY MODEL best_model FROM EXPERIMENT churn_prediction
WITH (replicas=3, memory='2Gi');

예측 생성

PREDICT churn_probability, risk_score
USING churn_model
AS SELECT customer_id, age, tenure FROM new_customers;

모델 설명

DESCRIBE MODEL churn_model;

모델 드롭

DROP MODEL old_model;

최선의 관행

파라미터 튜닝： 호환되는 모든 선택된 알고리즘에 알고리즘 특정 옵션이 적용됩니다.
기본값: 모든 옵션에는 합리적인 기본값이 있습니다. 기본값과 다른 옵션만 지정하세요.
리소스 제한: 실험은 max_trials와 time_budget_minutes를 모두 존중합니다. 한계가 도달하면 중지합니다.
재현성: random_seed을 설정하여 여러 실행 간 일관된 결과를 얻으세요.
알고리즘 호환성 각 작업 유형에 대해 시스템이 자동으로 호환되지 않는 알고리즘을 필터링합니다.

문서 버전: 1.0
최종 업데이트: 2025년 12월
웹사이트: https://synapcores.com

원래 게시처: synapcores.com — SynapCores는 무료이며 단일 바이너리 AI 전용 데이터베이스(벡터 + 그래프 + SQL + LLM)입니다.

추천 피드

DEV Community