ML Classification Pipeline

📥 Load Your Dataset

Upload any CSV with a binary classification target column. After loading, you'll select which column is the target and which are features.

📂

Drop a CSV file here or click to browse

Any CSV with a header row and a binary target column (0/1, yes/no, true/false, or two distinct text labels).
Numeric and low-cardinality categorical features are both supported.

Supported formats: Standard CSV with a header row. Binary targets may be encoded as numbers (0/1), booleans (true/false), or any two distinct text values (e.g. "Yes"/"No", "Presence"/"Absence", "Benign"/"Malignant"). Feature columns should be numeric or low-cardinality categorical (≤12 unique values — these are one-hot encoded automatically).

🔍 Explore Data

Preview rows, inspect statistics, and understand class balance before modeling.

📭

Load data in Step 1 first

⚙️ Feature Preprocessing

Choose a scaling or transformation for each numeric feature. Defaults are suggested based on each column's distribution.

Tip: Distance-based models (Logistic Regression, K-NN) benefit from StandardScaler or MinMaxScaler on continuous features. Tree-based models (Decision Tree, Random Forest) are scale-invariant. Skewed distributions may benefit from Log1p or Sqrt.

📭

Load data in Step 1 first

🎛️ Configure Models

Select classifiers to train and adjust the train/test split.

Select Models

📐

Logistic Regression

Gradient descent · Linear boundary

✓

🔵

K-Nearest Neighbors

Distance-based · k=7

✓

🔔

Gaussian Naive Bayes

Probabilistic · Feature independence

✓

🌲

Decision Tree

CART · Gini impurity · depth 8

✓

🌳

Random Forest

Ensemble · 20 trees · bagging

✓

⚡

Gradient Boosting

Sequential trees · XGBoost-style

✓

🔁

AdaBoost

Weighted stumps · adaptive boosting

✓

✂️

Linear SVM

SGD · hinge loss · max-margin

✓

Train / Test Split

Train Test

Training: 80% · Testing: 20% · Random seed: 100

Options

Stratified split (preserve class ratio in both sets) Compute ROC curves & AUC Compute feature importance (tree-based models)

No experiment variants saved. Go to Step 3 → Preprocessing to save variants and compare feature engineering approaches.

🏋️ Training

Models are trained in-browser using pure JavaScript implementations — no server or Python required.

⏳

Waiting to start…

📊 Results

Comprehensive evaluation across all trained classifiers.

📭

Train models first in Step 5

🔮 Predict on New Data

Upload an unlabelled CSV — all trained models will run on every row and their predictions are shown side-by-side so you can compare where models agree and where they diverge.

⚠️ Train at least one model in Step 5 before generating predictions.

Upload Prediction CSV

📂

Drop unlabelled CSV here or click to browse

Must contain the same feature columns as your training set. No target column required — it will be ignored if present.

ML Classification
Pipeline