BaseMetaModel(
data: vectorbtpro.data.base.Data,
s1: str,
s2: str,
window_in_days: int,
minp: int,
runners: Dict[str, vectorbtpro.portfolio.base.Portfolio],
features: pandas.core.frame.DataFrame | numpy.ndarray,
add_features: Dict[str, Callable],
)
A model class for feature processing and category management.
This class provides utilities to work with feature data, extract categories,
and generate visualizations for model analysis.
Method generated by attrs for class BaseMetaModel.
Static methods
set_keys
set_keys(
with_id: bool,
reward_registry: list,
) ‑> List[str]
Set keys.
Parameters:
| Name | Type | Default | Description |
with_id | bool, optional | -- | Output id columns if True. Otherwise, model name columns. Setting with_id to True is necessary later in the process to retrieve model parameters. Defaults to True. |
reward_registry | list, optional | -- | Config of models to use for metric calculation. If None, uses config. |
Returns:
| Type | Description |
tp.List[str] | Keys. |
from_model_config
from_model_config(
window_in_days: int,
minp: int = None,
metrics: str | List[str] = 'sharpe_ratio',
with_id: bool = True,
reward_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
add_features: Dict[str, Callable] = None,
) ‑> systematica.models.meta_model.base.BaseMetaModel
Create a Model instance from rolling metrics.
Parameters:
| Name | Type | Default | Description |
window_in_days | int | -- | Rolling window size in days. |
minp | int | None | Minimum number of observations required. |
metrics | str | sharpe_ratio | Metric(s) to calculate from the data. |
with_id | bool | True | Output id columns if True. Otherwise, model name columns. Setting with_id to True is necessary later in the process to retrieve model parameters. Defaults to True. |
reward_registry | List[vbt.FrozenConfig] | None | Config of models to use for metric calculation. If None, uses ModelRewardRegistry. |
add_features | tp.Dict[str, tp.Callable], default None“ | -- | Add features to the data object. If None, add returns rets by default (see State.from_config). Defaults to None. |
Returns:
| Type | Description |
BaseMetaModel | A new Model instance with features populated from calculated rolling metrics. |
from_neptune_config
from_neptune_config(
window_in_days: int,
minp: int = None,
metrics: str | List[str] = 'sharpe_ratio',
with_id: bool = True,
reward_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
) ‑> systematica.models.meta_model.base.BaseMetaModel
Create a Model instance from rolling metrics.
Parameters:
| Name | Type | Default | Description |
selector | tp.Dict[str, str | int | BaseTrialSelector] | -- | The ID(s) of the Neptune run to fetch mapped with trial selector. Trial selector is a custom parameter selection. If None, retrieve ‘best/params’ from neptune. if int, retrieve trial number. If BaseTrialSelector, retrieve params based on algorithm. |
window_in_days | int | -- | Rolling window size in days. |
minp | int | None | Minimum number of observations required. |
metrics | str | sharpe_ratio | Metric(s) to calculate from the data. |
with_id | bool | True | Output id columns if True. Otherwise, model name columns. Setting with_id to True is necessary later in the process to retrieve model parameters. Defaults to True. |
reward_registry | tp.List[vbt.FrozenConfig] | None | Config of models to use for metric calculation. If None, uses NEPTUNE_reward_registry. |
Returns:
| Type | Description |
BaseMetaModel | A new Model instance with features populated from calculated rolling metrics. |
Instance variables
-
add_features: Dict[str, Callable]:
-
categories: pandas.core.series.Series:
-
category_mapping: Dict[str, int]:
-
data: vectorbtpro.data.base.Data:
-
features: pandas.core.frame.DataFrame | numpy.ndarray:
-
label_mapping: Dict[str, int]:
-
minp: int:
-
model_mapping: Dict[str, systematica.portfolio.analyzer.PortfolioAnalyzer]:
-
runners: Dict[str, vectorbtpro.portfolio.base.Portfolio]:
-
s1: str:
-
s2: str:
-
window_in_days: int:
Methods
run_clf
run_clf(
self,
splitter: str = 'from_custom_rolling',
custom_splitter: str = None,
custom_splitter_kwargs: Dict[str, Any] = None,
preprocessor: sklearn.base.BaseEstimator = None,
estimator: sklearn.base.BaseEstimator = None,
training_window: Union(annotations=(<class 'vectorbtpro.utils.params.Param'>, <class 'int'>), resolved=True) = 365,
testing_window: Union(annotations=(<class 'vectorbtpro.utils.params.Param'>, <class 'int'>), resolved=True) = 60,
n_steps: int = 1,
downsample: str = '1d',
state_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
to_numpy: bool = False,
raw_output: bool = False,
**split_kwargs,
) ‑> pandas.core.frame.DataFrame | numpy.ndarray
Meta classifier CV.
Parameters:
| Name | Type | Default | Description |
splitter | str | from_custom_rolling | The method for splitting the data into training and testing sets. Default is “from_custom_rolling”. |
custom_splitter | str | None | Custom splitter function to use. Default is None. |
custom_splitter_kwargs | tp.Kwargs | None | Custom arguments for the splitter function. Default is None. |
preprocessor | BaseEstimator | None | Standardize features. If None, defaults to StandardScaler. Defaults to None. |
estimator | BaseEstimator | None | Classifier model. If None, defaults to Logistic Regression (aka logit, MaxEnt) classifier. Defaults to None. |
training_window | int | 365 | The size of the training window for cross-validation. Default is 365. |
testing_window | int | 60 | The size of the testing window for cross-validation. Default is 60. |
n_steps | int | 1 | Number of periods to shift backward by n positions. This operation intentionally looks ahead to train the model! Must be positive. Default to 1. |
downsample | str | 1d | Resample data before state computation to speed up the process. If None, no resampling is performed. Defaults to 1d (daily). |
state_registry | tp.List[vbt.FrozenConfig] | None | State config to use. If None, defaults to STATE_CONFIG. The default is None. |
to_numpy | bool | False | Whether to return the result as a NumPy array. Default is False. |
raw_output | bool | False | Whether to return the raw output without any alignment. Default is False. |
split_kwargs | tp.Kwargs | -- | Additional key word arguments for vectorBT PRO splitter. |
Returns:
| Type | Description |
pd.DataFrame | tp.Array2d: | Trained Model output. |
get_target
get_target(
self,
to_numpy: bool = False,
) ‑> numpy.ndarray | pandas.core.series.Series
Convert categorical labels to numeric codes.
Parameters:
| Name | Type | Default | Description |
to_numpy | bool | False | Output numpy array if True. Pandas DataFrame otherwise. Defaut to False. |
Returns:
| Type | Description |
tp.Array1d | pd.Series | Array or Series containing the numeric encoding of categories. |
get_inputs(
self,
downsample: str = '1d',
to_numpy: bool = False,
state_registry: List[vectorbtpro.utils.config.FrozenConfig] = None,
) ‑> pandas.core.frame.DataFrame | numpy.ndarray
Get state representation: Input (X).
To include data in vbt.Data object, use data.add_feature method as follow:data = sma.load_clean_data("1d")
rets = sma.get_returns(data.close)
data = data.add_feature("rets", rets, missing_index="drop")
Parameters:
| Name | Type | Default | Description |
downsample | str | 1d | Resample data before state computation to speed up the process. If None, no resampling is performed. Defaults to 1d (daily). |
to_numpy | bool | False | Output Numpy array if True. Pandas DataFrame object otherwise. The default is False. |
state_registry | tp.List[StateConfig] | None | State config to use. If None, defaults to StateRegistry. The default is None. |
Returns:
| Type | Description |
pd.DataFrame | tp.Array2d | Input (exogenous) variables. |
get_accuracy_score
get_accuracy_score(
self,
y_pred: pandas.core.series.Series,
normalize: bool = True,
) ‑> float
Get Accuracy classification score.
In multilabel classification, this function computes subset accuracy:
the set of labels predicted for a sample must exactly match the
corresponding set of labels in y_true.
See Also:
balanced_accuracy_score: Compute the balanced accuracy to deal with imbalanced datasets.
jaccard_score: Compute the Jaccard similarity coefficient score.
hamming_loss: Compute the average Hamming loss or Hamming distance between two sets of samples.
zero_one_loss : Compute the Zero-one classification loss. By default, the function will return the percentage of imperfectly predicted subsets.
Parameters:
| Name | Type | Default | Description |
y_pred | pd.Series | -- | Predicted labels, as returned by a classifier. |
normalize | bool | True | If False, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples. |
Returns:
| Type | Description |
float or int | If normalize=True, return the fraction of correctly classified samples (float), else returns the number of correctly classified samples (int). The best performance is 1 with normalize=True and the number of samples with normalize=False. |
get_report
get_report(
self,
y_pred: pandas.core.series.Series,
) ‑> pandas.core.frame.DataFrame
Build a report showing the main classification metrics.
See Also:
precision_recall_fscore_support: Compute precision, recall, F-measure and support for each class.
confusion_matrix: Compute confusion matrix to evaluate the accuracy of a classification.
multilabel_confusion_matrix: Compute a confusion matrix for each class or sample.
Parameters:
| Name | Type | Default | Description |
y_pred | pd.Series | -- | Estimated targets as returned by a classifier. |
Returns:
| Type | Description |
report : pd.DataFrame | DataFrame summary of the precision, recall, F1 score for each class. |
get_confusion_matrix
get_confusion_matrix(
self,
y_pred: pandas.core.series.Series,
normalize: str = None,
) ‑> pandas.core.frame.DataFrame
Compute confusion matrix to evaluate the accuracy of a classification.
By definition a confusion matrix C is such that Ci,j
is equal to the number of observations known to be in group :math: i and
predicted to be in group j.
Thus in binary classification, the count of true negatives is
C0,0, false negatives is C1,0, true positives is
C1,1 and false positives is C0,1.
See Also:
ConfusionMatrixDisplay.from_estimator: Plot the confusion matrix
given an estimator, the data, and the label.
ConfusionMatrixDisplay.from_predictions: Plot the confusion matrix
given the true and predicted labels.
ConfusionMatrixDisplay : Confusion Matrix visualization.
Parameters:
| Name | Type | Default | Description |
y_pred | pd.Series | -- | Estimated targets as returned by a classifier. |
normalize | str | None | Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None, confusion matrix will not be normalized. |
Returns:
| Type | Description |
pd.DataFrame | Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and predicted label being j-th class. |
get_feature_importance
get_feature_importance(
self,
X: pandas.core.frame.DataFrame,
tree_based_estimator: sklearn.ensemble._forest.RandomForestClassifier,
) ‑> pandas.core.frame.DataFrame
The impurity-based feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the (normalized)
total reduction of the criterion brought by that feature. It is also
known as the Gini importance.
Impurity-based feature importances can be misleading for
high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.
Parameters:
| Name | Type | Default | Description |
X | pd.DataFrame | -- | State representation: Input (X). |
tree_based_estimator | RandomForestClassifier | -- | Tree based estimator supporting feature_importances_ method. |
Returns:
| Type | Description |
pd.DataFrame | The values of this array sum to 1, unless all trees are single node trees consisting of only the root node, in which case it will be an array of zeros. |
plot_rolling_metrics
plot_rolling_metrics(
self,
) ‑> vectorbtpro.utils.figure.FigureWidget
Visualize rolling metrics from features.
Returns:
| Type | Description |
vbt.FigureWidget | A visualization of the features using vectorbtpro plotting. |
plot_target
plot_target(
self,
) ‑> vectorbtpro.utils.figure.FigureWidget
Visualize encoded labels.
Returns:
| Type | Description |
vbt.FigureWidget | A visualization of the encoded labels using vectorbtpro plotting. |
plot_heatmap_overlay
plot_heatmap_overlay(
self,
y_test_or_pred: pandas.core.series.Series,
**layout_kwargs,
) ‑> vectorbtpro.utils.figure.FigureWidget
Plot a Series as a line and overlay it with a heatmap.
Parameters:
| Name | Type | Default | Description |
y_test_or_pred | pd.Series | -- | Labels or estimated targets as returned by a classifier. |
layout_kwargs | tp.Kwargs | -- | Additional Plotly key-word arguments |
Returns:
| Type | Description |
vbt.FigureWidget | Plot a Series as a line and overlay it with a heatmap. |