DBGSOM (Directed Batch Growing Self-Organizing Map): A Neural Network for Clustering, Classification, Nonlinear Projection/Manifold learning, Data Visualization.
The network automatically determines the number of prototypes needed to represent the data. Starting from 4 neurons, the map expands at boundary positions where quantization error exceeds a configurable threshold: no need to pre-specify cluster count. The result is a topology-preserving 2D grid where neighboring neurons represent similar inputs.
Features
- No cluster count needed — map grows until quantization error falls below threshold;
lambda_controls sensitivity - sklearn-compatible — drop-in for
KMeans,DBSCAN: implementsfit_predict,transform,score, andpredict_proba - Topology-preserving — related samples cluster as grid neighbors; topographic error < 5% on Digits
- Faster than classical SOMs — batch learning rule trains on all samples per epoch (vs. online, sample-by-sample)
- Built-in visualization —
plot()renders neuron grid coloured by density, label, error or hit count.
How it works
In brief: Four neurons initialize → samples assigned to nearest neuron → weights update toward assigned samples → boundary neurons with high error spawn new neighbors → σ decays → repeat until max_neurons or n_iter reached. Neighboring neurons influence each other's weight update → topology preserved during training.
DBGSOM builds a 2D rectangular prototype map where each neuron connects to four neighbors. Four neurons init with random weights from input data. Each epoch: every sample is assigned to the nearest neuron (BMU); weights are updated toward mean of the mapped samples. A neighborhood function couples neighboring neurons so that low-dimensional map ordering is preserved; neighborhood width shrinks over time (global → local structure). A growing mechanism inserts new neurons at boundary positions where quantization error exceeds growing threshold.
How to install
Download from PyPI
Install from PyPI via uv (recommended):
or with pip:
Install from source
Clone and install with uv (recommended):
git clone https://github.com/SandroMartens/DBGSOM.git
cd DBGSOM
uv syncAlternatively with pip:
git clone https://github.com/SandroMartens/DBGSOM.git cd DBGSOM pip install -e .
Usage
DBGSOM implements the scikit-learn API and provides two estimators:
| Class | Use case |
|---|---|
SomVQ |
Unsupervised clustering / vector quantization |
SomClassifier |
Supervised classification |
Clustering / Vector Quantization
from dbgsom import SomVQ from sklearn.datasets import load_digits X, y = load_digits(return_X_y=True) vq = SomVQ(lambda_=80.0, max_neurons=80) labels = vq.fit_predict(X) print(f"Neurons: {len(vq.neurons_)}") print(f"Quantization error: {vq.quantization_error_:.4f}") print(f"Topographic error: {vq.topographic_error_:.4f}")
Key growth parameters:
| Parameter | Default | Effect |
|---|---|---|
lambda_ |
115.0 | Growing threshold — higher → fewer neurons |
max_neurons |
5 x sqrt(n_samples) |
Hard cap on neuron count |
n_iter |
500 | Training epochs; growth only happens in first half |
Classification
from dbgsom import SomClassifier from sklearn.datasets import load_digits from sklearn.model_selection import train_test_split X, y = load_digits(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) clf = SomClassifier(lambda_=80.0, max_neurons=80) clf.fit(X_train, y_train) print(clf.score(X_test, y_test)) # accuracy proba = clf.predict_proba(X_test) # class probabilities
Transform
Both estimators implement transform() — represents each sample as sparse non-negative linear combination of prototype weights:
coefs = vq.transform(X) # shape (n_samples, n_prototypes)
Visualization
plot() renders SOM neurons as dots and neighborhood edges as grey lines via seaborn objects.
vq.plot(color="density") # continuous -> colour gradient clf.plot(color="label") # categorical -> colour legend vq.plot(color="hit_count", pointsize="error") # colour + size encoding vq.plot(color="density", layout="pca", palette="magma_r")
Supported attributes for color / pointsize:
'label', 'epoch_created', 'error', 'average_distance', 'density', 'hit_count'
| Parameter | Options | Description |
|---|---|---|
color |
any node attribute | Numeric attributes → continuous colour scale; int/str with ≤ 20 unique values → legend |
pointsize |
any numeric attribute | Node size proportional to attribute value |
layout |
'grid' (default), 'pca' |
Node placement algorithm |
palette |
any Matplotlib colormap | Applied to colour mapping |
Examples
Comparisons
SOM algorithm comparison (Digits, PCA projection)
DBGSOM (dynamic grid, size determined automatically) vs. MiniSom and SuSi (fixed grids) vs. KMeans (no topology). All trained on same Digits embedding.
Clustering metrics (Digits dataset)
ARI, Silhouette, Davies-Bouldin, training time. All algorithms use same cluster count — determined automatically by DBGSOM.
Full benchmark notebooks:
| Notebook | What it shows |
|---|---|
clustering_comparison.ipynb |
DBGSOM vs. KMeans, MiniBatchKMeans, AgglomerativeClustering on Iris and Digits |
som_comparison.ipynb |
DBGSOM vs. MiniSom, SuSi on Digits and Fashion-MNIST (QE, TE, training time, scaling) |
manifold_comparison.ipynb |
DBGSOM vs. Isomap, t-SNE, UMAP on MNIST: trustworthiness, continuity, folds/tears, runtime |
Dependencies
- Python >= 3.12
- numpy
- numba
- NetworkX
- tqdm
- scikit-learn
- seaborn
- pandas
Citation
If you use DBGSOM in your research, please cite:
Martens, S. (2025). DBGSOM: A Python implementation of the Directed Batch Growing Self-Organizing Map. Zenodo. https://doi.org/10.5281/zenodo.20525611
References
- A directed batch growing approach to enhance the topology preservation of self-organizing map, Mahdi Vasighi and Homa Amini, 2017, http://dx.doi.org/10.1016/j.asoc.2017.02.015
- Reference implementation by the authors in Matlab: https://github.com/mvasighi/DBGSOM
- Statistics-enhanced Direct Batch Growth Self-Organizing Mapping for efficient DoS Attack Detection, Xiaofei Qu et al., 2019, 10.1109/ACCESS.2019.2922737
- Entropy-Defined Direct Batch Growing Hierarchical Self-Organizing Mapping for Efficient Network Anomaly Detection, Xiaofei Qu et al., 2021, 10.1109/ACCESS.2021.3064200
- Self-Organizing Maps, 3rd Edition, Teuvo Kohonen, 2003
- MATLAB Implementations and Applications of the Self-Organizing Map, Teuvo Kohonen, 2014
- Smoothed self-organizing map for robust clustering, P. D'Urso, L. De Giovanni and R. Massari, 2019, https://doi.org/10.1016/j.ins.2019.06.038
License
dbgsom is licensed under MIT license.
































