> ## Documentation Index
> Fetch the complete documentation index at: https://systematica.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Factors

> systematica.preprocessing.factors

## `get_factor_stats`

```python theme={null}
get_factor_stats(
    X_train: numpy.ndarray,
    pca: sklearn.base.BaseEstimator,
    estimator: sklearn.base.BaseEstimator,
) ‑> systematica.preprocessing.factors.FactorParams
```

Calculate factor weights (n\_components, symbols) based on PCA, intercepts and
betas based on estimator.

<Note>
  It must be applied to the train set.
</Note>

**Parameters**:

| Name        | Type            | Default | Description                                                                                                                |
| ----------- | --------------- | ------- | -------------------------------------------------------------------------------------------------------------------------- |
| `rets`      | `tp.Array2d`    | `--`    | Array of asset returns (n\_assets, n\_times).                                                                              |
| `pca`       | `BaseEstimator` | `--`    | PCA model that will be fitted to the standardized returns.                                                                 |
| `estimator` | `BaseEstimator` | `--`    | Regression model (e.g., `LinearRegression`) to fit the eigenportfolio returns and the asset returns to estimate residuals. |

**Returns**:

| Type           | Description                                                                                                                                                                       |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `FactorParams` | Factor weights (n\_components, n\_assets), intercept of the regression estimator and beta coefficient of the regression estimator. Parameters are calculated in the training set. |

## `get_eigenportfolio_residuals`

```python theme={null}
get_eigenportfolio_residuals(
    y_test: numpy.ndarray,
    params: systematica.preprocessing.factors.FactorParams,
) ‑> numpy.ndarray
```

Calculate residuals by regressing asset returns on eigenportfolio returns.

**Parameters**:

| Name     | Type           | Default | Description                                                                                                                                                                       |
| -------- | -------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `y_test` | `tp.Array2d`   | `--`    | Array of test asset returns (n\_assets, n\_test\_times).                                                                                                                          |
| `params` | `FactorParams` | `--`    | Factor weights (n\_components, n\_assets), intercept of the regression estimator and beta coefficient of the regression estimator. Parameters are calculated in the training set. |

**Returns**:

| Type         | Description                                                       |
| ------------ | ----------------------------------------------------------------- |
| `tp.Array2d` | Residual returns (n\_assets, n\_test\_times) for the test period. |

## `fit_factor_model`

```python theme={null}
fit_factor_model(
    X_train: numpy.ndarray,
    pca: sklearn.base.BaseEstimator,
    estimator: sklearn.base.BaseEstimator,
) ‑> systematica.preprocessing.factors.FactorParams
```

Fit model to asset returns by extracting risk-adjusted eigenvectors
using PCA and applying to the returns data.

<Note>
  It must be applied to the train set.
</Note>

**Parameters**:

| Name        | Type            | Default | Description                                                                                                              |
| ----------- | --------------- | ------- | ------------------------------------------------------------------------------------------------------------------------ |
| `X_train`   | `tp.Array2d`    | `--`    | Array of asset returns (n\_assets, n\_times).                                                                            |
| `pca`       | `BaseEstimator` | `--`    | PCA model that will be fitted to the returns data.                                                                       |
| `estimator` | `BaseEstimator` | `--`    | Regression model (e.g., LinearRegression) to fit the eigenportfolio returns and the asset returns to estimate residuals. |

**Returns**:

| Type           | Description                                                                                                                                                                       |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `FactorParams` | Factor weights (n\_components, n\_assets), intercept of the regression estimator and beta coefficient of the regression estimator. Parameters are calculated in the training set. |

## `predict_factor_model`

```python theme={null}
predict_factor_model(
    y_test: numpy.ndarray,
    params: systematica.preprocessing.factors.FactorParams,
) ‑> numpy.ndarray
```

Predict residuals for the test set by applying the trained model on the
returns and factor weights.

**Parameters**:

| Name     | Type           | Default | Description                                                                                                                                                                       |
| -------- | -------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `y_test` | `tp.Array2d`   | `--`    | Array of asset returns (n\_assets, n\_times) for the test set.                                                                                                                    |
| `params` | `FactorParams` | `--`    | Factor weights (n\_components, n\_assets), intercept of the regression estimator and beta coefficient of the regression estimator. Parameters are calculated in the training set. |

**Returns**:

| Type         | Description                                                 |
| ------------ | ----------------------------------------------------------- |
| `tp.Array2d` | Predicted residuals (n\_assets, n\_times) for the test set. |

## `residuals`

```python theme={null}
residuals(
    rets: numpy.ndarray,
    lookback: int,
    pca: sklearn.base.BaseEstimator,
    estimator: sklearn.base.BaseEstimator,
) ‑> numpy.ndarray
```

Compute residual returns using PCA and regression, separating training and test data.

**Parameters**:

| Name        | Type            | Default | Description                                                                                                              |
| ----------- | --------------- | ------- | ------------------------------------------------------------------------------------------------------------------------ |
| `rets`      | `tp.Array2d`    | `--`    | Array of asset returns (n\_assets, n\_times).                                                                            |
| `lookback`  | `int`           | `--`    | Number of periods to look back for training.                                                                             |
| `pca`       | `BaseEstimator` | `--`    | PCA model for factor weights.                                                                                            |
| `estimator` | `BaseEstimator` | `--`    | Regression model (e.g., LinearRegression) to fit the eigenportfolio returns and the asset returns to estimate residuals. |

**Returns**:

| Type         | Description                                                 |
| ------------ | ----------------------------------------------------------- |
| `tp.Array2d` | Residual returns (n\_assets, n\_times) for the test period. |

## `residuals_cv`

```python theme={null}
residuals_cv(
    rets: numpy.ndarray,
    index: numpy.ndarray,
    columns: Sequence[Hashable] = None,
    splitter: str = 'from_rolling',
    custom_splitter: str = None,
    custom_splitter_kwargs: Dict[str, Any] = None,
    training_window: int = 252,
    testing_window: int = 60,
    n_components: int | float = 15,
    estimator: sklearn.base.BaseEstimator = None,
    autotune: bool = False,
    to_numpy: bool = False,
    **split_kwargs,
) ‑> pandas.core.frame.DataFrame | numpy.ndarray
```

Cross-validation wrapper to apply residuals decomposition split-wise.

**Parameters**:

| Name                     | Type            | Default        | Description                                                                      |
| ------------------------ | --------------- | -------------- | -------------------------------------------------------------------------------- |
| `rets`                   | `tp.Array2d`    | `--`           | Array of asset returns (n\_assets, n\_times).                                    |
| `index`                  | `tp.Array1d`    | `--`           | Datetime index for the results.                                                  |
| `columns`                | `tp.Labels`     | `None`         | Column labels for the results DataFrame.                                         |
| `splitter`               | `str`           | `from_rolling` | Method for splitting the data. Defaults to `from_rolling`.                       |
| `custom_splitter`        | `str`           | `None`         | Custom splitter function, if provided.                                           |
| `custom_splitter_kwargs` | `tp.Kwargs`     | `None`         | Additional arguments for the custom splitter.                                    |
| `training_window`        | `int`           | `252`          | Number of periods for the training window. Defaults to `252`.                    |
| `testing_window`         | `int`           | `60`           | Number of periods for the testing window. Defaults to `60`.                      |
| `n_components`           | `int`           | `15`           | Number of PCA components to retain. Defaults to `15`.                            |
| `estimator`              | `BaseEstimator` | `None`         | Regression model to use for residual extraction. Defaults to `LinearRegression`. |
| `autotune`               | `bool`          | `False`        | Flag to enable autotuning. Default is `False`.                                   |
| `to_numpy`               | `bool`          | `False`        | If True, return the result as a NumPy array. Defaults to `False`.                |
| `split_kwargs`           | `tp.Kwargs`     | `--`           | Additional keyword arguments for the splitter function.                          |

**Returns**:

| Type                         | Description                       |
| ---------------------------- | --------------------------------- |
| `pd.DataFrame or tp.Array2d` | Residuals after cross-validation. |
