Skip to main content

Overview

Statistical arbitrage models in Systematica are designed to identify and exploit temporary price divergences between related financial instruments. The framework provides several model categories, each implementing different statistical approaches to identify trading opportunities.

Ornstein-Uhlenbeck Process Models

The Ornstein-Uhlenbeck (OU) process models implement mean-reverting statistical arbitrage based on the Ornstein-Uhlenbeck stochastic process.
  • Mean Reversion Speed: The autocorrelation bb relates to θ\theta. A smaller bb (closer to 0) implies faster reversion (larger θ\theta).
  • Trading Signals: Deviations from μ\mu beyond σeq\sigma_{\text{eq}} suggest trading opportunities, as the series is expected to revert.
The OU process models estimate mean reversion parameters through a multi-step process:
  • Parameter Estimation: Calculates long-term mean and equilibrium standard deviation
  • S-Score Calculation: Generates standardized reversion signals
  • Residual Integration: Uses PCA-decomposed residuals as input data
Key parameters include:
  • reversion: Speed of mean reversion threshold (auto-calculated if None)
  • ann_factor: Annualization factor (inferred from data frequency)
  • training_window/testing_window: Cross-validation windows
  • n_components: PCA variance explained (default 0.5)
OU process models mean-reverting behavior in financial time series using the stochastic differential equation: dxt=θ(μxt)dt+σdWtdx_t = \theta (\mu - x_t) dt + \sigma dW_t Where:
  • xtx_t: Price or spread at time tt
  • θ\theta: Speed of mean reversion
  • μ\mu: Long-term mean (equilibrium level)
  • σ\sigma: Volatility of the process
  • WtW_t: Wiener process (random noise)
These calculations enable quantification of mean-reverting behavior and identification of mispricings for arbitrage strategies.

Key Calculations

The OU parameters are estimated for a time series xx as follows:
  1. Autocorrelation Coefficient (bb):
  • Computed as the correlation between consecutive values: b=corr(xt,xt+1)b=\text{corr}(x_{t}, x_{t+1})
  • Measures how strongly xtx_t predicts xt+1x_{t+1}. A value close to 1 indicates slow mean reversion.
  1. Long-Term Mean (μ\mu):
  • Derived from the regression xt+1=bxt+Ax_{t+1} = b x_t + A, where: A=mean(xt+1bxt)A = \text{mean}(x_{t+1} - b \cdot x_t) and μ=A1b\mu = \frac{A}{1 - b}
  • Represents the equilibrium level the series reverts to.
  1. Equilibrium Standard Deviation (σeq\sigma_{\text{eq}}):
  • Calculated from residuals Z=xt+1bxtZ = x_{t+1} - b \cdot x_t: σeq=var(Z)1b2\sigma_{\text{eq}} = \sqrt{\frac{\text{var}(Z)}{1 - b^2}}
  • Measures volatility around the mean, adjusted for autocorrelation.

RollingOUProcess vs OUProcessCV

Two implementations are available: rolling window and cross-validation approaches.
  • RollingOUProcess: Uses a sliding window for real-time parameter estimation, ideal for streaming data.
  • OUProcessCV: Employs cross-validation splits for robust backtesting, optimizing parameters over training/testing periods.
FeatureRollingOUProcessOUProcessCV
Window TypeRolling/sliding windowCross-validation splits
Parameterswindow, minptraining_window, testing_window
Use CaseReal-time/streamingBacktesting/optimization
Functionrolling_ou_process_nbou_process_cv

RollingOUProcess

RollingOUProcess(
    window: int = 365,
    minp: int = None,
    reversion: float = None,
    ann_factor: str | int = 'auto',
    splitter: str = 'from_custom_rolling',
    custom_splitter: str | None = None,
    custom_splitter_kwargs: dict | None = None,
    training_window: int = 365,
    testing_window: int = 60,
    n_components: str | int | float = 0.5,
    estimator: sklearn.base.BaseEstimator = None,
    autotune: bool = False,
)
Ornstein-Uhlenbeck (OU) process rolling mean reversion model:
  • Dimensionality Reduction: Principal Component Analysis is used to model returns of assets by decomposing them into systematic (market) components and idiosyncratic (residual) components. Assets are modeled using a multi-factor approach, with the residual returns assumed to be the source of alpha.
  • Market-Neutral Portfolio: Ensuring that the portfolio’s factor exposures sum to zero.
  • Eigenportfolio Construction: Assets are allocated to eigenportfolios based on the eigenvectors of the correlation matrix, with investments scaled by the volatility of each asset.
  • Extracting the Residuals: Residuals — the deviations from a statistical relationship — are derived from models that predict or explain the relationship between asset prices.
  • Mean-Reverting?: Residual-based strategies typically assume that residuals will revert to their mean over time, creating opportunities for profit.
Workflow:
  1. Calculate Returns: Use closing prices to calculate returns.
  2. Standardize Data: PCA seeks to maximize the variance of each component. Standardization will scale variance to a common measure.
  3. Reduce Dimensionality: Decompose asset returns using PCA - Use to explain ~5050% of the variance.
  4. Eigenportfolio Returns: Estimate the market-neutral eigenportfolio returns for each asset in the universe.
  5. Systematic and Residual Components: Estimate the residual components for each asset.
Reference: Method generated by attrs for class RollingOUProcess.

Ancestors

  • systematica.models.base.BaseStatArb
  • abc.ABC

Descendants

  • systematica.api.models.ou_process.OUProcessCV

Instance variables

  • ann_factor: str | int: Annualization coefficient. If ‘auto’, infer frequency and ann_factor factor automatically. Defaults to ‘auto’.
  • autotune: bool: Tune estimator hyperparameter automatically. Defaults to False, which means no autotuning is performed.
  • custom_splitter: str | None: Custom splitter to use for data partitioning. Defaults to None, which means no custom splitter is used. If set, it should be a string that matches a custom splitter function.
  • custom_splitter_kwargs: dict | None: Additional keyword arguments for the custom splitter. Defaults to None, which means no additional arguments are passed. If custom_splitter is set, this should contain any necessary parameters for the custom splitter function.
  • estimator: sklearn.base.BaseEstimator: scikit-learn estimator to use. If None, uses LinearRegression from sklearn. Defaults to None, which means no estimator is used.
  • minp: int: Mininim period. Defaults to None, which means no minimum period is applied.
  • n_components: str | int | float: Number of components to keep. If n_components == 'mle', Minka’s MLE is used to guess the dimension. If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. Defaults to 0.5, which mean half of the variance is captured.
  • reversion: float: Speed of reversion. If None, the model auto evaluate the reversion speed coefficient. reversion should be less than half the size of the window used for residual estimation. For example, with a 60-day window, half is 30 days. Therefore: Reversion=252/30=8.4Reversion = 252 / 30 = 8.4 Assuming 252 trading days in a year. Defaults to None.
  • splitter: str: Default splitter to be used if custom_splitter is not passed. Choices are from_rolling, from_expanding, from_custom_rolling, from_custom_expanding. Defaults to from_custom_rolling.
  • testing_window: int: The size of the testing window. Defaults to 60. This is the window size used for testing the model.
  • training_window: int: The size of the training window. Defaults to 365. This is the window size used for training the model.
  • window: int: The size of the rolling window. Defaults to 365.

Methods

get_residuals

get_residuals(
    self,
    rets: numpy.ndarray,
    index: numpy.ndarray,
    columns: numpy.ndarray,
) ‑> pandas.core.frame.DataFrame
Calculate residuals from the input returns. Parameters:
NameTypeDefaultDescription
retstp.Array2d--A NumPy array of returns.
indextp.Array1d--A NumPy array of datetime indices.
columnstp.Array1d--A NumPy array of column indices.
Returns:
TypeDescription
tp.Array2dA NumPy array of residuals.

OUProcessCV

OUProcessCV(
    window: int = 365,
    minp: int = None,
    reversion: float = None,
    ann_factor: str | int = 'auto',
    splitter: str = 'from_custom_rolling',
    custom_splitter: str | None = None,
    custom_splitter_kwargs: dict | None = None,
    training_window: int = 365,
    testing_window: int = 60,
    n_components: str | int | float = 0.5,
    estimator: sklearn.base.BaseEstimator = None,
    autotune: bool = False,
)
Ornstein-Uhlenbeck (OU) process mean reversion model with cross-validation. See RollingOUProcess. Method generated by attrs for class OUProcessCV.

Ancestors

  • systematica.api.models.ou_process.RollingOUProcess
  • systematica.models.base.BaseStatArb
  • abc.ABC