Ou Process - Systematica - Blockforce Capital

Overview

Statistical arbitrage models in Systematica are designed to identify and exploit temporary price divergences between related financial instruments. The framework provides several model categories, each implementing different statistical approaches to identify trading opportunities.

Ornstein-Uhlenbeck Process Models

The Ornstein-Uhlenbeck (OU) process models implement mean-reverting statistical arbitrage based on the Ornstein-Uhlenbeck stochastic process.

Mean Reversion Speed: The autocorrelation $b$ relates to $\theta$ . A smaller $b$ (closer to 0) implies faster reversion (larger $\theta$ ).
Trading Signals: Deviations from $\mu$ beyond $\sigma_{\text{eq}}$ suggest trading opportunities, as the series is expected to revert.

The OU process models estimate mean reversion parameters through a multi-step process:

Parameter Estimation: Calculates long-term mean and equilibrium standard deviation
S-Score Calculation: Generates standardized reversion signals
Residual Integration: Uses PCA-decomposed residuals as input data

Key parameters include:

reversion: Speed of mean reversion threshold (auto-calculated if None)
ann_factor: Annualization factor (inferred from data frequency)
training_window/testing_window: Cross-validation windows
n_components: PCA variance explained (default 0.5)

OU process models mean-reverting behavior in financial time series using the stochastic differential equation:

dx_t = \theta (\mu - x_t) dt + \sigma dW_t

Where:

$x_t$ : Price or spread at time $t$
$\theta$ : Speed of mean reversion
$\mu$ : Long-term mean (equilibrium level)
$\sigma$ : Volatility of the process
$W_t$ : Wiener process (random noise)

These calculations enable quantification of mean-reverting behavior and identification of mispricings for arbitrage strategies.

Key Calculations

The OU parameters are estimated for a time series

x

as follows:

Autocorrelation Coefficient ( $b$ ):

Computed as the correlation between consecutive values: $b=\text{corr}(x_{t}, x_{t+1})$
Measures how strongly $x_t$ predicts $x_{t+1}$ . A value close to 1 indicates slow mean reversion.

Long-Term Mean ( $\mu$ ):

Derived from the regression $x_{t+1} = b x_t + A$ , where: $A = \text{mean}(x_{t+1} - b \cdot x_t)$ and $\mu = \frac{A}{1 - b}$
Represents the equilibrium level the series reverts to.

Equilibrium Standard Deviation ( $\sigma_{\text{eq}}$ ):

Calculated from residuals $Z = x_{t+1} - b \cdot x_t$ : $\sigma_{\text{eq}} = \sqrt{\frac{\text{var}(Z)}{1 - b^2}}$
Measures volatility around the mean, adjusted for autocorrelation.

`RollingOUProcess` vs `OUProcessCV`

Two implementations are available: rolling window and cross-validation approaches.

RollingOUProcess: Uses a sliding window for real-time parameter estimation, ideal for streaming data.
OUProcessCV: Employs cross-validation splits for robust backtesting, optimizing parameters over training/testing periods.

Feature	`RollingOUProcess`	`OUProcessCV`
Window Type	Rolling/sliding window	Cross-validation splits
Parameters	`window`, `minp`	`training_window`, `testing_window`
Use Case	Real-time/streaming	Backtesting/optimization
Function	`rolling_ou_process_nb`	`ou_process_cv`

`RollingOUProcess`

RollingOUProcess(
    window: int = 365,
    minp: int = None,
    reversion: float = None,
    ann_factor: str | int = 'auto',
    splitter: str = 'from_custom_rolling',
    custom_splitter: str | None = None,
    custom_splitter_kwargs: dict | None = None,
    training_window: int = 365,
    testing_window: int = 60,
    n_components: str | int | float = 0.5,
    estimator: sklearn.base.BaseEstimator = None,
    autotune: bool = False,
)

Ornstein-Uhlenbeck (OU) process rolling mean reversion model:

Dimensionality Reduction: Principal Component Analysis is used to model returns of assets by decomposing them into systematic (market) components and idiosyncratic (residual) components. Assets are modeled using a multi-factor approach, with the residual returns assumed to be the source of alpha.
Market-Neutral Portfolio: Ensuring that the portfolio’s factor exposures sum to zero.
Eigenportfolio Construction: Assets are allocated to eigenportfolios based on the eigenvectors of the correlation matrix, with investments scaled by the volatility of each asset.
Extracting the Residuals: Residuals — the deviations from a statistical relationship — are derived from models that predict or explain the relationship between asset prices.
Mean-Reverting?: Residual-based strategies typically assume that residuals will revert to their mean over time, creating opportunities for profit.

Workflow:

Calculate Returns: Use closing prices to calculate returns.
Standardize Data: PCA seeks to maximize the variance of each component. Standardization will scale variance to a common measure.
Reduce Dimensionality: Decompose asset returns using PCA - Use to explain ~ $50$ % of the variance.
Eigenportfolio Returns: Estimate the market-neutral eigenportfolio returns for each asset in the universe.
Systematic and Residual Components: Estimate the residual components for each asset.

Reference:

Statistical Arbitrage in the U.S. Equities Market

Method generated by attrs for class RollingOUProcess.

Ancestors

systematica.models.base.BaseStatArb
abc.ABC

Descendants

systematica.api.models.ou_process.OUProcessCV

Instance variables

ann_factor: str | int: Annualization coefficient. If ‘auto’, infer frequency and ann_factor factor automatically. Defaults to ‘auto’.
autotune: bool: Tune estimator hyperparameter automatically. Defaults to False, which means no autotuning is performed.
custom_splitter: str | None: Custom splitter to use for data partitioning. Defaults to None, which means no custom splitter is used. If set, it should be a string that matches a custom splitter function.
custom_splitter_kwargs: dict | None: Additional keyword arguments for the custom splitter. Defaults to None, which means no additional arguments are passed. If custom_splitter is set, this should contain any necessary parameters for the custom splitter function.
estimator: sklearn.base.BaseEstimator: scikit-learn estimator to use. If None, uses LinearRegression from sklearn. Defaults to None, which means no estimator is used.
minp: int: Mininim period. Defaults to None, which means no minimum period is applied.
n_components: str | int | float: Number of components to keep. If n_components == 'mle', Minka’s MLE is used to guess the dimension. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. Defaults to 0.5, which mean half of the variance is captured.
reversion: float: Speed of reversion. If None, the model auto evaluate the reversion speed coefficient. reversion should be less than half the size of the window used for residual estimation. For example, with a 60-day window, half is 30 days. Therefore: $Reversion = 252 / 30 = 8.4$ Assuming 252 trading days in a year. Defaults to None.
splitter: str: Default splitter to be used if custom_splitter is not passed. Choices are from_rolling, from_expanding, from_custom_rolling, from_custom_expanding. Defaults to from_custom_rolling.
testing_window: int: The size of the testing window. Defaults to 60. This is the window size used for testing the model.
training_window: int: The size of the training window. Defaults to 365. This is the window size used for training the model.
window: int: The size of the rolling window. Defaults to 365.

Methods

`get_residuals`

get_residuals(
    self,
    rets: numpy.ndarray,
    index: numpy.ndarray,
    columns: numpy.ndarray,
) ‑> pandas.core.frame.DataFrame

Calculate residuals from the input returns. Parameters:

Name	Type	Default	Description
`rets`	`tp.Array2d`	`--`	A NumPy array of returns.
`index`	`tp.Array1d`	`--`	A NumPy array of datetime indices.
`columns`	`tp.Array1d`	`--`	A NumPy array of column indices.

Returns:

Type	Description
`tp.Array2d`	A NumPy array of residuals.

`OUProcessCV`

OUProcessCV(
    window: int = 365,
    minp: int = None,
    reversion: float = None,
    ann_factor: str | int = 'auto',
    splitter: str = 'from_custom_rolling',
    custom_splitter: str | None = None,
    custom_splitter_kwargs: dict | None = None,
    training_window: int = 365,
    testing_window: int = 60,
    n_components: str | int | float = 0.5,
    estimator: sklearn.base.BaseEstimator = None,
    autotune: bool = False,
)

Ornstein-Uhlenbeck (OU) process mean reversion model with cross-validation. See RollingOUProcess. Method generated by attrs for class OUProcessCV.

Ancestors

systematica.api.models.ou_process.RollingOUProcess
systematica.models.base.BaseStatArb
abc.ABC

​Overview

​Ornstein-Uhlenbeck Process Models

​Key Calculations

​RollingOUProcess vs OUProcessCV

​RollingOUProcess

​Ancestors

​Descendants

​Instance variables

​Methods

​get_residuals

​OUProcessCV

​Ancestors

Overview

Ornstein-Uhlenbeck Process Models

Key Calculations

`RollingOUProcess` vs `OUProcessCV`

`RollingOUProcess`

Ancestors

Descendants

Instance variables

Methods

`get_residuals`

`OUProcessCV`

Ancestors