Meta Model - Systematica - Blockforce Capital

This document covers Meta Models, an advanced meta-learning system for dynamic strategy selection and classification in the Systematica framework. Meta Models use machine learning classifiers to determine which trading strategy to apply at any given time based on market state representations and historical strategy performance.

Meta models implement machine learning approaches for dynamic strategy selection, using features from multiple base models to predict optimal strategies.

Overview

The Meta Model system implements a hierarchical classification approach where multiple trading strategies compete for selection based on their historical performance. The system consists of state representation engines, reward calculation modules, and classification models that together determine optimal strategy allocation. Meta models operate through a multi-stage process:

Feature Collection: Executes multiple base models and calculates rolling metrics
State Representation: Generates input features from market data
Target Generation: Creates categorical labels from best-performing strategies
Classification: Trains ML model to predict optimal strategy
Signal Generation: Converts predictions to trading signals

Key configuration options:

use_neptune: Load models from Neptune experiments vs. local config
sklearn.Pipeline: Custom ML pipeline components
window_in_days: Rolling metric calculation window
training_window/testing_window: Cross-validation parameters

Architecture

System Overview

Layer	Purpose	Key Responsibilities
API Layer	Public Interface	Expose meta-model functionality to users
Core Layer	Business Logic	Orchestrate classification pipeline and registries
Processing Layer	Data Transformation	Handle feature scaling and model training
Config Layer	Configuration Management	Store and retrieve model/state configurations
Signal Layer	Strategy Generation	Generate trading signals and strategy mappings

API & Core Components

Component	Type	Function	Input	Output
`MetaModel`	API Class	Public interface for meta-model operations	User requests	Processed results
`BaseMetaModel`	Core Engine	Central orchestrator managing all subsystems	Configuration data	Coordinated pipeline execution
`State`	State Representation	Generate market state features	Market data	Feature vectors

Key Integration Points

Integration	Components	Data Exchange	Frequency
Model Training	`BaseMetaModel.from_model_config()`	Feature matrices, labels	Per training cycle
Experiment Tracking	`BaseMetaModel.from_neptune_config()`	Metrics, parameters	Per experiment
State Extraction	`State.from_config()`	Configuration objects	Per prediction
Signal Generation	`get_meta_model_signals()`	Strategy identifiers	Per market update

Config & State Management

COnfig	Data Type	Purpose	Schema
`state_registry`	`StateRegistry`	Market state configurations	`state_id`
`model_registry`	`FeatureConfig`	Feature engineering configurations	`feature_id`
`neptune_registry`	`NeptuneSelector`	Experiment tracking selectors	`experiment_id`

Signal Generation

Component	Type	Function	Trigger	Output
`get_meta_model_signals()`	Signal Generator	Generate trading/strategy signals	Market conditions	Signal list
`model_mapping`	Strategy Mapper	Map strategies to appropriate analyzers	Strategy type	Analyzer assignment

Meta-Learning Functionality

Foundation

The BaseMetaModel class provides the foundation for meta-learning functionality, handling feature processing, category management, and classifier training.

Component	Purpose	Key Methods
Data Management	Handles time series data and symbol mapping	`data`, `s1`, `s2`
Feature Processing	Manages rolling metrics and state features	`features`, `get_inputs()`
Category Mapping	Maps model performance to categorical labels	`categories`, `label_mapping`
Classification	Runs cross-validation and prediction	`run_clf()`

Steps

Step	Component	Function	Input	Output
1	`from_model_config()` or `from_neptune()`	Model Initialization	Config/Neptune config	Model instance: Load pre-trained model or configuration
2	`features`	Feature Extraction	Raw data	Rolling time-series metrics DataFrame
3	`categories`	Model Selection	Rolling metrics	Best performing model per period
4a	`get_target()`	Target Generation	Categories	Numeric category codes: Convert categorical labels to numeric format
4b	`get_inputs()`	Input Preparation	Categories	State representation: Extract feature vectors for model training
5	`run_clf()`	Model Training	Targets + Inputs	Trained classifier

Model Creation and Data Sources

Meta Models can be created from two primary data sources: model registries for local configurations or Neptune experiments for cloud-based model tracking.

From Model Registry: The from_model_config() method creates meta models using local feature configurations stored in ModelRegistry.
From Neptune Experiments: The from_neptune() method creates meta models using cloud-based feature configurations stored in NeptuneAIRegistry.

Classification and Cross-Validation

The classification engine uses time series cross-validation to train models that predict which trading strategy should be active based on current market conditions.

Performance Evaluation

The system provides comprehensive model evaluation tools including accuracy metrics, classification reports, and confusion matrices.

Evaluation Method	Purpose	Output
`get_accuracy_score()`	Overall classification accuracy	`Float`
`get_report()`	Detailed precision/recall metrics	`DataFrame`
`get_confusion_matrix()`	Prediction vs actual matrix	`DataFrame`

State Representation

State representation converts market data into feature vectors suitable for machine learning classification. The State class processes multiple data sources to create comprehensive market state descriptions.

Signal Generation and Trading

Meta Models generate trading signals by converting classifier predictions into actionable buy/sell decisions for the optimal strategy at each time period.

Signal Generation Pipeline

Stage	Component	Input	Process	Output
1	`model_output`	Raw data	Classifier predictions	Prediction probabilities/classes
2	`model_mapping`	Category definitions	Category to runner mapping	Strategy-runner associations: Map predictions to execution strategies
3	`get_meta_model_signals()`	Predictions + Mappings	Signal conversion	Trading signals
4	`Signals`	Trading signals	Signal formatting	Entry/exit points

Portfolio Execution Pipeline

Stage	Component	Input	Process	Output	Business Impact
5	`select_symbols`	Entry/exit points	Symbol filtering	Filtered symbol list	Focus on tradeable assets
6	`run_from_signals()`	Signals + symbols	Portfolio simulation	Simulated trades	Execute strategy simulation
7	`PortfolioAnalyzer`	Simulation results	Performance analysis	Analysis report	Measure strategy performance

`MetaModel`

MetaModel(
    preprocessor: sklearn.base.BaseEstimator = None,
    estimator: sklearn.base.BaseEstimator = None,
    n_steps: int = 1,
    training_window: int = 365,
    testing_window: int = 60,
    splitter: str = 'from_custom_rolling',
    custom_splitter: str | None = None,
    custom_splitter_kwargs: Dict[str, Any] = None,
    use_neptune: bool = False,
    metrics: str = 'sharpe_ratio',
    model_registry: List[systematica.registries.base.Register] = None,
    neptune_registry: List[systematica.registries.base.Register] = None,
    window_in_days: int = 365,
    minp: int = None,
    downsample: str = '1d',
    state_registry: systematica.registries.base.Register = None,
)

MetaModel is a class for cross-validation of meta models. It allows for the evaluation of different model configurations and the generation of trading signals based on the results. The class is designed to work with various classifiers and preprocessors, and it supports custom data splitting strategies for training and testing. Method generated by attrs for class MetaModel.

Ancestors

systematica.models.base.BaseStatArb
abc.ABC

Instance variables

custom_splitter: str | None: Custom splitter to use for data partitioning. Defaults to None. If set, it should be a string that matches a custom splitter function.
custom_splitter_kwargs: Dict[str, Any]: Additional keyword arguments for the custom splitter. Defaults to None. If custom_splitter is set, this should contain any necessary parameters for the custom splitter function.
downsample: str: Resample data before state representation computation use to speed up the process. Upsampling and ffill is performed straight after. If None, no resampling is performed. Defaults to 1d (daily).
estimator: sklearn.base.BaseEstimator: Classifier model. If None, defaults to LogisticRegression (aka logit, MaxEnt) classifier from sklearn. Defaults to None.
metrics: str: Metric(s) to calculate the reward. Defaults to sharpe_ratio.
minp: int: Minimum number of observations required. Defaults to None.
model_registry: List[systematica.registries.base.Register]: Config of models to use for metric calculation. Defaults to model_registry.
n_steps: int: Number of periods to shift backward by n positions. This operation intentionally looks ahead to train the model! Must be positive. Default to 1.
neptune_registry: List[systematica.registries.base.Register]: Config of models to use for metric calculation when use_neptune is set to True. Defaults to neptune_registry.
preprocessor: sklearn.base.BaseEstimator: Standardize features. If None, defaults to StandardScaler. Defaults to None.
splitter: str: Default splitter to be used if custom_splitter is not passed. Choices are from_rolling, from_custom_rolling, from_expanding, from_custom_expanding. Defaults to “from_custom_rolling”.
state_registry: systematica.registries.base.Register: State representation config. if None, uses StateRegistry. Defaults to None.
testing_window: int: The size of the testing window. Defaults to 60.
training_window: int: The size of the training window. Defaults to 365.
use_neptune: bool: Use neptune if True, config otherwise. Defaults to False.
window_in_days: int: The size of the rolling window in days. Defaults to 365. This is the window size used for calculating both rolling metrics and state representations.

Methods

`get_signals`

get_signals(
    self,
    model_output: numpy.ndarray,
    select_symbols: str | int | list[str | int] = None,
    **kwargs,
) ‑> systematica.signals.base.Signals

Generate trading signals based on scores. See BaseStatArb.get_signals Parameters:

Name	Type	Default	Description
`model_output`	`tp.Array`	`--`	Output from the model, typically scores for each symbol.
`select_symbols`	`str`	`None`	Symbols to select for generating signals. If `None`, all symbols are used.

Returns:

Type	Description
`Signals`	Signals object containing buy and sell signals for the selected symbols.

​Overview

​Architecture

​System Overview

​API & Core Components

​Key Integration Points

​Config & State Management

​Signal Generation

​Meta-Learning Functionality

​Foundation

​Steps

​Model Creation and Data Sources

​Classification and Cross-Validation

​Performance Evaluation

​State Representation

​Signal Generation and Trading

​Signal Generation Pipeline

​Portfolio Execution Pipeline

​MetaModel

​Ancestors

​Instance variables

​Methods

​get_signals

Overview

Architecture

System Overview

API & Core Components

Key Integration Points

Config & State Management

Signal Generation

Meta-Learning Functionality

Foundation

Steps

Model Creation and Data Sources

Classification and Cross-Validation

Performance Evaluation

State Representation

Signal Generation and Trading

Signal Generation Pipeline

Portfolio Execution Pipeline

`MetaModel`

Ancestors

Instance variables

Methods

`get_signals`