> ## Documentation Index
> Fetch the complete documentation index at: https://systematica.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Ou Process

> systematica.api.models.ou_process

## Overview

Statistical arbitrage models in Systematica are designed to identify and exploit temporary price divergences between related financial instruments.
The framework provides several model categories, each implementing different statistical approaches to identify trading opportunities.

## Ornstein-Uhlenbeck Process Models

The Ornstein-Uhlenbeck (OU) process models implement mean-reverting statistical arbitrage based on the Ornstein-Uhlenbeck stochastic process.

* **Mean Reversion Speed**: The autocorrelation $b$ relates to $\theta$. A smaller $b$ (closer to 0) implies faster reversion (larger $\theta$).
* **Trading Signals**: Deviations from $\mu$ beyond $\sigma_{\text{eq}}$ suggest trading opportunities, as the series is expected to revert.

The OU process models estimate mean reversion parameters through a multi-step process:

* **Parameter Estimation**: Calculates long-term mean and equilibrium standard deviation
* **S-Score Calculation**: Generates standardized reversion signals
* **Residual Integration**: Uses PCA-decomposed residuals as input data

Key parameters include:

* `reversion`: Speed of mean reversion threshold (auto-calculated if `None`)
* `ann_factor`: Annualization factor (inferred from data frequency)
* `training_window`/`testing_window`: Cross-validation windows
* `n_components`: PCA variance explained (default `0.5`)

OU process models mean-reverting behavior in financial time series using the stochastic differential equation:

$$
dx_t = \theta (\mu - x_t) dt + \sigma dW_t 
$$

Where:

* $x_t$: Price or spread at time $t$
* $\theta$: Speed of mean reversion
* $\mu$: Long-term mean (equilibrium level)
* $\sigma$: Volatility of the process
* $W_t$: Wiener process (random noise)

These calculations enable quantification of mean-reverting behavior and identification of mispricings for arbitrage strategies.

### Key Calculations

The OU parameters are estimated for a time series $x$ as follows:

1. **Autocorrelation Coefficient ($b$)**:

* Computed as the correlation between consecutive values: $b=\text{corr}(x_{t}, x_{t+1})$
* Measures how strongly $x_t$ predicts $x_{t+1}$. A value close to 1 indicates slow mean reversion.

2. **Long-Term Mean ($\mu$)**:

* Derived from the regression $x_{t+1} = b x_t + A$, where: $A = \text{mean}(x_{t+1} - b \cdot x_t)$ and $\mu = \frac{A}{1 - b}$
* Represents the equilibrium level the series reverts to.

3. **Equilibrium Standard Deviation ($\sigma_{\text{eq}}$)**:

* Calculated from residuals $Z = x_{t+1} - b \cdot x_t$: $\sigma_{\text{eq}} = \sqrt{\frac{\text{var}(Z)}{1 - b^2}}$
* Measures volatility around the mean, adjusted for autocorrelation.

## `RollingOUProcess` vs `OUProcessCV`

Two implementations are available: rolling window and cross-validation approaches.

* **RollingOUProcess**: Uses a sliding window for real-time parameter estimation, ideal for streaming data.
* **OUProcessCV**: Employs cross-validation splits for robust backtesting, optimizing parameters over training/testing periods.

| Feature     | `RollingOUProcess`      | `OUProcessCV`                       |
| ----------- | ----------------------- | ----------------------------------- |
| Window Type | Rolling/sliding window  | Cross-validation splits             |
| Parameters  | `window`, `minp`        | `training_window`, `testing_window` |
| Use Case    | Real-time/streaming     | Backtesting/optimization            |
| Function    | `rolling_ou_process_nb` | `ou_process_cv`                     |

## `RollingOUProcess`

```python theme={null}
RollingOUProcess(
    window: int = 365,
    minp: int = None,
    reversion: float = None,
    ann_factor: str | int = 'auto',
    splitter: str = 'from_custom_rolling',
    custom_splitter: str | None = None,
    custom_splitter_kwargs: dict | None = None,
    training_window: int = 365,
    testing_window: int = 60,
    n_components: str | int | float = 0.5,
    estimator: sklearn.base.BaseEstimator = None,
    autotune: bool = False,
)
```

Ornstein-Uhlenbeck (OU) process rolling mean reversion model:

* **Dimensionality Reduction**: Principal Component Analysis is used to
  model returns of assets by decomposing them into systematic (market)
  components and idiosyncratic (residual) components. Assets are modeled
  using a multi-factor approach, with the residual returns assumed to be the
  source of alpha.
* **Market-Neutral Portfolio**: Ensuring that the portfolio's factor
  exposures sum to zero.
* **Eigenportfolio Construction**: Assets are allocated to eigenportfolios
  based on the eigenvectors of the correlation matrix, with investments
  scaled by the volatility of each asset.
* **Extracting the Residuals**: Residuals -- the deviations from a
  statistical relationship -- are derived from models that predict or explain
  the relationship between asset prices.
* **Mean-Reverting?**: Residual-based strategies typically assume that
  residuals will revert to their mean over time, creating opportunities for
  profit.

**Workflow**:

1. **Calculate Returns**:
   Use closing prices to calculate returns.

2. **Standardize Data**:
   PCA seeks to maximize the variance of each component. Standardization will
   scale variance to a common measure.

3. **Reduce Dimensionality**:
   Decompose asset returns using PCA - Use to explain \~$50$% of the variance.

4. **Eigenportfolio Returns**:
   Estimate the market-neutral eigenportfolio returns for each asset in the
   universe.

5. **Systematic and Residual Components**:
   Estimate the residual components for each asset.

**Reference**:

* [Statistical Arbitrage in the U.S. Equities Market](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1153505)

Method generated by attrs for class RollingOUProcess.

### Ancestors

* `systematica.models.base.BaseStatArb`
* `abc.ABC`

### Descendants

* `systematica.api.models.ou_process.OUProcessCV`

### Instance variables

* `ann_factor: str | int`: Annualization coefficient. If 'auto', infer frequency and ann\_factor  factor automatically. Defaults to 'auto'.

* `autotune: bool`: Tune estimator hyperparameter automatically. Defaults to `False`, which means no autotuning is performed.

* `custom_splitter: str | None`: Custom splitter to use for data partitioning. Defaults to `None`, which means no custom splitter is used. If set, it should be a string that matches a custom splitter function.

* `custom_splitter_kwargs: dict | None`: Additional keyword arguments for the custom splitter. Defaults to `None`, which means no additional arguments are passed. If `custom_splitter` is set, this should contain any necessary parameters for the custom splitter function.

* `estimator: sklearn.base.BaseEstimator`: scikit-learn estimator to use. If `None`, uses `LinearRegression` from `sklearn`.  Defaults to `None`, which means no estimator is used.

* `minp: int`: Mininim period. Defaults to `None`, which means no minimum period is applied.

* `n_components: str | int | float`: Number of components to keep.  If `n_components == 'mle'`, Minka's MLE is used to guess the dimension. If `0 < n_components < 1`, select the number of components such that the  amount of variance that needs to be explained is greater than the  percentage specified by `n_components`. Defaults to `0.5`, which mean half of the variance is captured.

* `reversion: float`: Speed of reversion. If `None`, the model auto evaluate the reversion speed  coefficient. `reversion` should be less than half the size of the window  used for residual estimation.   For example, with a 60-day window, half is 30 days. Therefore: $Reversion = 252 / 30 = 8.4$  Assuming 252 trading days in a year. Defaults to `None`.

* `splitter: str`: Default splitter to be used if `custom_splitter` is not passed. Choices are `from_rolling`, `from_expanding`, `from_custom_rolling`, `from_custom_expanding`. Defaults to `from_custom_rolling`.

* `testing_window: int`: The size of the testing window. Defaults to `60`. This is the window size used for testing the model.

* `training_window: int`: The size of the training window. Defaults to `365`. This is the window size used for training the model.

* `window: int`: The size of the rolling window. Defaults to `365`.

### Methods

#### `get_residuals`

```python theme={null}
get_residuals(
    self,
    rets: numpy.ndarray,
    index: numpy.ndarray,
    columns: numpy.ndarray,
) ‑> pandas.core.frame.DataFrame
```

Calculate residuals from the input returns.

**Parameters**:

| Name      | Type         | Default | Description                        |
| --------- | ------------ | ------- | ---------------------------------- |
| `rets`    | `tp.Array2d` | `--`    | A NumPy array of returns.          |
| `index`   | `tp.Array1d` | `--`    | A NumPy array of datetime indices. |
| `columns` | `tp.Array1d` | `--`    | A NumPy array of column indices.   |

**Returns**:

| Type         | Description                 |
| ------------ | --------------------------- |
| `tp.Array2d` | A NumPy array of residuals. |

## `OUProcessCV`

```python theme={null}
OUProcessCV(
    window: int = 365,
    minp: int = None,
    reversion: float = None,
    ann_factor: str | int = 'auto',
    splitter: str = 'from_custom_rolling',
    custom_splitter: str | None = None,
    custom_splitter_kwargs: dict | None = None,
    training_window: int = 365,
    testing_window: int = 60,
    n_components: str | int | float = 0.5,
    estimator: sklearn.base.BaseEstimator = None,
    autotune: bool = False,
)
```

Ornstein-Uhlenbeck (OU) process mean reversion model with cross-validation.

See `RollingOUProcess`.

Method generated by attrs for class OUProcessCV.

### Ancestors

* `systematica.api.models.ou_process.RollingOUProcess`
* `systematica.models.base.BaseStatArb`
* `abc.ABC`
