Configuration Overview
The ECD model configuration consists of several key components:Input Features
Input features define how your dataset columns are processed. Each feature is a dictionary with three fields:- name: Field name used during model inference
- type: Feature type - supports the following types:
binary- Binary features (0/1, True/False)number- Numerical/continuous featurescategory- Categorical featuresbag- Bag-of-words featuresset- Set features (unordered collections)sequence- Sequence features (ordered lists)text- Text features (natural language)vector- Vector features (dense embeddings)
- column: Column name in your dataset
Input Features Example
Input Features Example
Output Features
Output features define your model’s prediction targets. You can specify multiple outputs with custom loss functions. Supported Output Feature Types:binary- Binary classification (0/1, True/False)number- Regression/numerical predictionscategory- Multi-class classificationbag- Bag-of-words predictionsset- Set predictions (unordered collections)sequence- Sequence predictions (ordered lists)text- Text generationvector- Vector predictions (dense embeddings)
Example Configuration
Loss Configuration
For classification tasks, configure the loss function:class_weights(default:null): Weights for each class. Usenullfor equal weightingweight(default:1.0): Overall loss weight for multi-task learning
Combiner Configuration
The combiner merges features before making predictions. ECD uses the TabNet architecture.Example Configuration
TabNet Combiner Parameters
Architecture Parameters
Architecture Parameters
- size (default:
32): Hidden layer size (N_a in TabNet paper) - output_size (default:
128): Fully connected layer output size (N_d in TabNet paper) - num_steps (default:
3): Number of attention steps (N_steps in TabNet paper) - num_total_blocks (default:
4): Total feature transformer blocks per step - num_shared_blocks (default:
2): Shared feature transformer blocks across steps
Regularization Parameters
Regularization Parameters
- dropout (default:
0.05): Dropout rate for transformer blocks - sparsity (default:
0.0001): Sparsity loss multiplier (lambda_sparse in TabNet paper) - relaxation_factor (default:
1.5): Feature reuse factor (gamma in TabNet paper)- Value of 1.0 means each feature used once
- Higher values allow multiple feature usages
Batch Normalization
Batch Normalization
- bn_epsilon (default:
0.001): Epsilon added to batch norm denominator - bn_momentum (default:
0.05): Batch norm momentum (1 - m_B from TabNet paper) - bn_virtual_bs (default:
128): Virtual batch size for batch normalization
Trainer Configuration
Configure the training process with optimization and validation settings.Optimization Settings
Optimization Settings
- optimizer:
{"type": "adam"}- Adam optimizer for gradient descent - learning_rate (default:
0.001): Initial learning rate - learning_rate_scaling (default:
"sqrt"): LR scaling strategy - decay (default:
true): Enable learning rate decay - decay_rate (default:
0.8): Rate of learning rate decay - decay_steps (default:
20000): Steps between decay applications
Training Parameters
Training Parameters
- epochs (default:
100): Maximum training epochs - batch_size (default:
"auto"): Batch size (auto-calculated or specify manually) - early_stop (default:
10): Stop if no improvement for N epochs - validation_field: Field name to validate on (e.g.,
"target") - validation_metric: Metric for validation (e.g.,
"roc_auc","accuracy")
Data Preprocessing
Data Preprocessing
- sample_ratio: Ratio of data to sample (e.g.,
0.01for 1%) - sample_size: Absolute number of samples to use
- oversample_minority: Oversample minority class for imbalanced data
- undersample_majority: Undersample majority class
- split: Configure train/validation/test split
type:"stratify"to maintain class distributionscolumn: Column to stratify onprobabilities: Split ratios (e.g.,[0.8, 0.1, 0.1]for 80/10/10)
Complete Configuration Example
Here’s a complete end-to-end ECD model configuration for a binary classification task:Quick Start Tips:
- Start with default parameters for your first training run
- Adjust
class_weightsif you have imbalanced classes - Increase
num_steps(3-7) for more complex feature interactions - Use
early_stopto prevent overfitting - Set
sample_ratioto a small value (0.01) for faster experimentation with large datasets