ECD models are particularly effective for tabular data and feature engineering tasks, using the TabNet architecture for superior performance on structured datasets.
Prerequisites
Before starting, ensure you have:- A Simplismart account with access to the Training Suite
- A publicly accessible dataset URL
- Your training configuration prepared (see configuration schema below)
Creating a Training Job
1
Initiate Training Job
- Navigate to My Trainings from the left sidebar
- Click Add a Training Job
-
Select ECD as the model type from the available options
2
Configure Training Parameters
Provide the following details:
- Experiment Name: Enter a descriptive name for your training experiment
- Dataset URL: Provide the publicly accessible URL to your dataset
- Training Configuration: Add your ECD model configuration
See the ECD Model Configuration Schema section below for detailed configuration options and examples.
-
Review all settings and click Create Job to start training
3
Monitor Training Progress
Once submitted, your training job will begin processing. You can:
- Monitor training progress in real-time
- View training metrics and logs
- Track loss curves and validation performance

Compiling Your Trained Model
After training completes, compile your model to prepare it for deployment.1
Navigate to Model Compilation
- Click the Compile button on your completed training job
- You’ll be redirected to the model compilation page
-
The page shows your model ready to be added to
My Models
2
Configure Model Details
Provide the following information:
- Model Name: Enter a descriptive name for your compiled model
- Infrastructure: Choose your deployment infrastructure:
- Simplismart Cloud: Deploy on Simplismart’s managed infrastructure
- Your Own Cloud: Use your own infrastructure (BYOC guide)
Most configuration options will be auto-populated based on your model class. Review them before proceeding.
-
Click Deploy Model to proceed to deployment configuration
Deploying Your ECD Model
Once your model is compiled, create a deployment to make it accessible via API.1
Configure Basic Deployment Settings
Set up your deployment with these parameters:
Basic Details
- Deployment Name: Choose a unique, descriptive name
- Model: Auto-populated with your compiled model
- Cloud: Select your infrastructure (Simplismart Cloud or your own)
- Accelerator Type: Choose the GPU type for inference
2
Set Up Auto-Scaling
Configure auto-scaling to handle variable workloads:
Scaling Range
- Minimum: 1 instance
- Maximum: Up to 8 instances (adjust based on your needs)
Scaling Metrics
Add metrics that trigger scaling actions:- GPU Utilization: Set threshold at 80% to scale up
- CPU Utilization: Set threshold at 80% for additional scaling control
Set appropriate thresholds to balance performance and cost. Too low may cause unnecessary scaling; too high may impact response times.
3
Add Deployment Tags
Organize your deployments with tags (optional but recommended):Example tags:
- Key:
env
, Value:staging
- Key:
model-type
, Value:ecd
- Key:
version
, Value:v1.0

4
Deploy and Verify
- Review all configuration settings
- Click Add Deployment to start the deployment process
- Monitor the deployment status on the right side of the screen

When the status shows Deployed, your model is ready to serve inference requests!
5
Access Your Model Endpoint
Once deployed, you can find your model endpoint:
- Navigate to Deployments in the left sidebar
- Click on your deployment name
- In the Details tab, find the Model Endpoint URL
- Copy this endpoint to use in your applications
ECD Model Configuration Schema
Understanding the ECD model configuration is crucial for training effective models. This section breaks down each component of the configuration.Configuration Overview
The ECD model configuration consists of several key components:Input Features
Input features define how your dataset columns are processed. Each feature is a dictionary with three fields:- name: Field name used during model inference
- type: Feature type - supports the following types:
binary
- Binary features (0/1, True/False)number
- Numerical/continuous featurescategory
- Categorical featuresbag
- Bag-of-words featuresset
- Set features (unordered collections)sequence
- Sequence features (ordered lists)text
- Text features (natural language)vector
- Vector features (dense embeddings)
- column: Column name in your dataset
Input Features Example
Input Features Example
Output Features
Output features define your model’s prediction targets. You can specify multiple outputs with custom loss functions. Supported Output Feature Types:binary
- Binary classification (0/1, True/False)number
- Regression/numerical predictionscategory
- Multi-class classificationbag
- Bag-of-words predictionsset
- Set predictions (unordered collections)sequence
- Sequence predictions (ordered lists)text
- Text generationvector
- Vector predictions (dense embeddings)
Example Configuration
Loss Configuration
For classification tasks, configure the loss function:class_weights
(default:null
): Weights for each class. Usenull
for equal weightingweight
(default:1.0
): Overall loss weight for multi-task learning
Combiner Configuration
The combiner merges features before making predictions. ECD uses the TabNet architecture.Example Configuration
TabNet Combiner Parameters
Architecture Parameters
Architecture Parameters
- size (default:
32
): Hidden layer size (N_a in TabNet paper) - output_size (default:
128
): Fully connected layer output size (N_d in TabNet paper) - num_steps (default:
3
): Number of attention steps (N_steps in TabNet paper) - num_total_blocks (default:
4
): Total feature transformer blocks per step - num_shared_blocks (default:
2
): Shared feature transformer blocks across steps
Regularization Parameters
Regularization Parameters
- dropout (default:
0.05
): Dropout rate for transformer blocks - sparsity (default:
0.0001
): Sparsity loss multiplier (lambda_sparse in TabNet paper) - relaxation_factor (default:
1.5
): Feature reuse factor (gamma in TabNet paper)- Value of 1.0 means each feature used once
- Higher values allow multiple feature usages
Batch Normalization
Batch Normalization
- bn_epsilon (default:
0.001
): Epsilon added to batch norm denominator - bn_momentum (default:
0.05
): Batch norm momentum (1 - m_B from TabNet paper) - bn_virtual_bs (default:
128
): Virtual batch size for batch normalization
Trainer Configuration
Configure the training process with optimization and validation settings.Optimization Settings
Optimization Settings
- optimizer:
{"type": "adam"}
- Adam optimizer for gradient descent - learning_rate (default:
0.001
): Initial learning rate - learning_rate_scaling (default:
"sqrt"
): LR scaling strategy - decay (default:
true
): Enable learning rate decay - decay_rate (default:
0.8
): Rate of learning rate decay - decay_steps (default:
20000
): Steps between decay applications
Training Parameters
Training Parameters
- epochs (default:
100
): Maximum training epochs - batch_size (default:
"auto"
): Batch size (auto-calculated or specify manually) - early_stop (default:
10
): Stop if no improvement for N epochs - validation_field: Field name to validate on (e.g.,
"target"
) - validation_metric: Metric for validation (e.g.,
"roc_auc"
,"accuracy"
)
Data Preprocessing
Data Preprocessing
- sample_ratio: Ratio of data to sample (e.g.,
0.01
for 1%) - sample_size: Absolute number of samples to use
- oversample_minority: Oversample minority class for imbalanced data
- undersample_majority: Undersample majority class
- split: Configure train/validation/test split
type
:"stratify"
to maintain class distributionscolumn
: Column to stratify onprobabilities
: Split ratios (e.g.,[0.8, 0.1, 0.1]
for 80/10/10)
Complete Configuration Example
Here’s a complete end-to-end ECD model configuration for a binary classification task:Quick Start Tips:
- Start with default parameters for your first training run
- Adjust
class_weights
if you have imbalanced classes - Increase
num_steps
(3-7) for more complex feature interactions - Use
early_stop
to prevent overfitting - Set
sample_ratio
to a small value (0.01) for faster experimentation with large datasets