Base - Systematica - Blockforce Capital

`BaseMetaModel`

BaseMetaModel(
    data: vectorbtpro.data.base.Data,
    s1: str,
    s2: str,
    window_in_days: int,
    minp: int,
    runners: Dict[str, vectorbtpro.portfolio.base.Portfolio],
    features: pandas.core.frame.DataFrame | numpy.ndarray,
    add_features: Dict[str, Callable],
)

A model class for feature processing and category management. This class provides utilities to work with feature data, extract categories, and generate visualizations for model analysis. Method generated by attrs for class BaseMetaModel.

Static methods

`set_keys`

set_keys(
    with_id: bool,
    reward_registry: list,
) ‑> List[str]

Set keys. Parameters:

Name	Type	Default	Description
`with_id`	`bool, optional`	`--`	Output id columns if True. Otherwise, model name columns. Setting `with_id` to True is necessary later in the process to retrieve model parameters. Defaults to True.
`reward_registry`	`list, optional`	`--`	Config of models to use for metric calculation. If None, uses config.

Returns:

Type	Description
`tp.List[str]`	Keys.

`from_model_config`

from_model_config(
    window_in_days: int,
    minp: int = None,
    metrics: str | List[str] = 'sharpe_ratio',
    with_id: bool = True,
    reward_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
    add_features: Dict[str, Callable] = None,
) ‑> systematica.models.meta_model.base.BaseMetaModel

Create a Model instance from rolling metrics. Parameters:

Name	Type	Default	Description
`window_in_days`	`int`	`--`	Rolling window size in days.
`minp`	`int`	`None`	Minimum number of observations required.
`metrics`	`str`	`sharpe_ratio`	Metric(s) to calculate from the data.
`with_id`	`bool`	`True`	Output id columns if `True`. Otherwise, model name columns. Setting `with_id` to `True` is necessary later in the process to retrieve model parameters. Defaults to True.
`reward_registry`	`List[vbt.FrozenConfig]`	`None`	Config of models to use for metric calculation. If `None`, uses `ModelRewardRegistry`.
`add_features`	`tp.Dict[str, tp.Callable], default` None“	`--`	Add features to the data object. If `None`, add returns `rets` by default (see `State.from_config`). Defaults to `None`.

Returns:

Type	Description
`BaseMetaModel`	A new Model instance with features populated from calculated rolling metrics.

`from_neptune_config`

from_neptune_config(
    window_in_days: int,
    minp: int = None,
    metrics: str | List[str] = 'sharpe_ratio',
    with_id: bool = True,
    reward_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
) ‑> systematica.models.meta_model.base.BaseMetaModel

Create a Model instance from rolling metrics. Parameters:

Name	Type	Default	Description
`selector`	`tp.Dict[str, str \| int \| BaseTrialSelector]`	`--`	The ID(s) of the Neptune run to fetch mapped with trial selector. Trial selector is a custom parameter selection. If None, retrieve ‘best/params’ from neptune. if int, retrieve trial number. If `BaseTrialSelector`, retrieve params based on algorithm.
`window_in_days`	`int`	`--`	Rolling window size in days.
`minp`	`int`	`None`	Minimum number of observations required.
`metrics`	`str`	`sharpe_ratio`	Metric(s) to calculate from the data.
`with_id`	`bool`	`True`	Output id columns if `True`. Otherwise, model name columns. Setting `with_id` to `True` is necessary later in the process to retrieve model parameters. Defaults to `True`.
`reward_registry`	`tp.List[vbt.FrozenConfig]`	`None`	Config of models to use for metric calculation. If `None`, uses `NEPTUNE_reward_registry`.

Returns:

Type	Description
`BaseMetaModel`	A new Model instance with features populated from calculated rolling metrics.

Instance variables

add_features: Dict[str, Callable]:
categories: pandas.core.series.Series:
category_mapping: Dict[str, int]:
data: vectorbtpro.data.base.Data:
features: pandas.core.frame.DataFrame | numpy.ndarray:
label_mapping: Dict[str, int]:
minp: int:
model_mapping: Dict[str, systematica.portfolio.analyzer.PortfolioAnalyzer]:
runners: Dict[str, vectorbtpro.portfolio.base.Portfolio]:
s1: str:
s2: str:
window_in_days: int:

Methods

`run_clf`

run_clf(
    self,
    splitter: str = 'from_custom_rolling',
    custom_splitter: str = None,
    custom_splitter_kwargs: Dict[str, Any] = None,
    preprocessor: sklearn.base.BaseEstimator = None,
    estimator: sklearn.base.BaseEstimator = None,
    training_window: Union(annotations=(<class 'vectorbtpro.utils.params.Param'>, <class 'int'>), resolved=True) = 365,
    testing_window: Union(annotations=(<class 'vectorbtpro.utils.params.Param'>, <class 'int'>), resolved=True) = 60,
    n_steps: int = 1,
    downsample: str = '1d',
    state_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
    to_numpy: bool = False,
    raw_output: bool = False,
    **split_kwargs,
) ‑> pandas.core.frame.DataFrame | numpy.ndarray

Meta classifier CV. Parameters:

Name	Type	Default	Description
`splitter`	`str`	`from_custom_rolling`	The method for splitting the data into training and testing sets. Default is “from_custom_rolling”.
`custom_splitter`	`str`	`None`	Custom splitter function to use. Default is `None`.
`custom_splitter_kwargs`	`tp.Kwargs`	`None`	Custom arguments for the splitter function. Default is `None`.
`preprocessor`	`BaseEstimator`	`None`	Standardize features. If `None`, defaults to `StandardScaler`. Defaults to `None`.
`estimator`	`BaseEstimator`	`None`	Classifier model. If `None`, defaults to Logistic Regression (aka `logit`, `MaxEnt`) classifier. Defaults to `None`.
`training_window`	`int`	`365`	The size of the training window for cross-validation. Default is `365`.
`testing_window`	`int`	`60`	The size of the testing window for cross-validation. Default is `60`.
`n_steps`	`int`	`1`	Number of periods to shift backward by `n` positions. This operation intentionally looks ahead to train the model! Must be positive. Default to `1`.
`downsample`	`str`	`1d`	Resample data before state computation to speed up the process. If None, no resampling is performed. Defaults to `1d` (daily).
`state_registry`	`tp.List[vbt.FrozenConfig]`	`None`	State config to use. If `None`, defaults to `STATE_CONFIG`. The default is `None`.
`to_numpy`	`bool`	`False`	Whether to return the result as a NumPy array. Default is `False`.
`raw_output`	`bool`	`False`	Whether to return the raw output without any alignment. Default is `False`.
`split_kwargs`	`tp.Kwargs`	`--`	Additional key word arguments for vectorBT PRO splitter.

Returns:

Type	Description
`pd.DataFrame \| tp.Array2d:`	Trained Model output.

`get_target`

get_target(
    self,
    to_numpy: bool = False,
) ‑> numpy.ndarray | pandas.core.series.Series

Convert categorical labels to numeric codes. Parameters:

Name	Type	Default	Description
`to_numpy`	`bool`	`False`	Output numpy array if True. Pandas DataFrame otherwise. Defaut to False.

Returns:

Type	Description
`tp.Array1d \| pd.Series`	Array or Series containing the numeric encoding of categories.

`get_inputs`

get_inputs(
    self,
    downsample: str = '1d',
    to_numpy: bool = False,
    state_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
) ‑> pandas.core.frame.DataFrame | numpy.ndarray

Get state representation: Input (X).

To include data in vbt.Data object, use data.add_feature method as follow:

data = sma.load_clean_data("1d")
rets = sma.get_returns(data.close)
data = data.add_feature("rets", rets, missing_index="drop")

Parameters:

Name	Type	Default	Description
`downsample`	`str`	`1d`	Resample data before state computation to speed up the process. If `None`, no resampling is performed. Defaults to `1d` (daily).
`to_numpy`	`bool`	`False`	Output Numpy array if `True`. Pandas DataFrame object otherwise. The default is `False`.
`state_registry`	`tp.List[StateConfig]`	`None`	State config to use. If `None`, defaults to `StateRegistry`. The default is `None`.

Returns:

Type	Description
`pd.DataFrame \| tp.Array2d`	Input (exogenous) variables.

`get_accuracy_score`

get_accuracy_score(
    self,
    y_pred: pandas.core.series.Series,
    normalize: bool = True,
) ‑> float

Get Accuracy classification score. In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. See Also:

balanced_accuracy_score: Compute the balanced accuracy to deal with imbalanced datasets.
jaccard_score: Compute the Jaccard similarity coefficient score.
hamming_loss: Compute the average Hamming loss or Hamming distance between two sets of samples.
zero_one_loss : Compute the Zero-one classification loss. By default, the function will return the percentage of imperfectly predicted subsets.

Parameters:

Name	Type	Default	Description
`y_pred`	`pd.Series`	`--`	Predicted labels, as returned by a classifier.
`normalize`	`bool`	`True`	If `False`, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.

Returns:

Type	Description
`float or int`	If `normalize=True`, return the fraction of correctly classified samples (`float`), else returns the number of correctly classified samples (`int`). The best performance is `1` with `normalize=True` and the number of samples with `normalize=False`.

`get_report`

get_report(
    self,
    y_pred: pandas.core.series.Series,
) ‑> pandas.core.frame.DataFrame

Build a report showing the main classification metrics. See Also:

precision_recall_fscore_support: Compute precision, recall, F-measure and support for each class.
confusion_matrix: Compute confusion matrix to evaluate the accuracy of a classification.
multilabel_confusion_matrix: Compute a confusion matrix for each class or sample.

Parameters:

Name	Type	Default	Description
`y_pred`	`pd.Series`	`--`	Estimated targets as returned by a classifier.

Returns:

Type	Description
`report : pd.DataFrame`	DataFrame summary of the precision, recall, F1 score for each class.

`get_confusion_matrix`

get_confusion_matrix(
    self,
    y_pred: pandas.core.series.Series,
    normalize: str = None,
) ‑> pandas.core.frame.DataFrame

Compute confusion matrix to evaluate the accuracy of a classification. By definition a confusion matrix

C

is such that

C_{i, j}

is equal to the number of observations known to be in group :math:

i

and predicted to be in group

j

. Thus in binary classification, the count of true negatives is

C_{0,0}

, false negatives is

C_{1,0}

, true positives is

C_{1,1}

and false positives is

C_{0,1}

. See Also:

ConfusionMatrixDisplay.from_estimator: Plot the confusion matrix given an estimator, the data, and the label.
ConfusionMatrixDisplay.from_predictions: Plot the confusion matrix given the true and predicted labels.
ConfusionMatrixDisplay : Confusion Matrix visualization.

Parameters:

Name	Type	Default	Description
`y_pred`	`pd.Series`	`--`	Estimated targets as returned by a classifier.
`normalize`	`str`	`None`	Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If `None`, confusion matrix will not be normalized.

Returns:

Type	Description
`pd.DataFrame`	Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and predicted label being j-th class.

`get_feature_importance`

get_feature_importance(
    self,
    X: pandas.core.frame.DataFrame,
    tree_based_estimator: sklearn.ensemble._forest.RandomForestClassifier,
) ‑> pandas.core.frame.DataFrame

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

Parameters:

Name	Type	Default	Description
`X`	`pd.DataFrame`	`--`	State representation: Input (X).
`tree_based_estimator`	`RandomForestClassifier`	`--`	Tree based estimator supporting `feature_importances_` method.

Returns:

Type	Description
`pd.DataFrame`	The values of this array sum to `1`, unless all trees are single node trees consisting of only the root node, in which case it will be an array of zeros.

`plot_rolling_metrics`

plot_rolling_metrics(
    self,
) ‑> vectorbtpro.utils.figure.FigureWidget

Visualize rolling metrics from features. Returns:

Type	Description
`vbt.FigureWidget`	A visualization of the features using vectorbtpro plotting.

`plot_target`

plot_target(
    self,
) ‑> vectorbtpro.utils.figure.FigureWidget

Visualize encoded labels. Returns:

Type	Description
`vbt.FigureWidget`	A visualization of the encoded labels using vectorbtpro plotting.

`plot_heatmap_overlay`

plot_heatmap_overlay(
    self,
    y_test_or_pred: pandas.core.series.Series,
    **layout_kwargs,
) ‑> vectorbtpro.utils.figure.FigureWidget

Plot a Series as a line and overlay it with a heatmap. Parameters:

Name	Type	Default	Description
`y_test_or_pred`	`pd.Series`	`--`	Labels or estimated targets as returned by a classifier.
`layout_kwargs`	`tp.Kwargs`	`--`	Additional Plotly key-word arguments

Returns:

Type	Description
`vbt.FigureWidget`	Plot a Series as a line and overlay it with a heatmap.

​BaseMetaModel

​Static methods

​set_keys

​from_model_config

​from_neptune_config

​Instance variables

​Methods

​run_clf

​get_target

​get_inputs

​get_accuracy_score

​get_report

​get_confusion_matrix

​get_feature_importance

​plot_rolling_metrics

​plot_target

​plot_heatmap_overlay

`BaseMetaModel`

Static methods

`set_keys`

`from_model_config`

`from_neptune_config`

Instance variables

Methods

`run_clf`

`get_target`

`get_inputs`

`get_accuracy_score`

`get_report`

`get_confusion_matrix`

`get_feature_importance`

`plot_rolling_metrics`

`plot_target`

`plot_heatmap_overlay`