> ## Documentation Index
> Fetch the complete documentation index at: https://systematica.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Scaling

> systematica.preprocessing.scaling

## `clip_nb`

```python theme={null}
clip_nb(
    number: float,
    upper: float,
    lower: float,
) ‑> float
```

Clip a number to be within the given bounds.

**Parameters**:

| Name     | Type    | Default | Description         |
| -------- | ------- | ------- | ------------------- |
| `number` | `float` | `--`    | The number to clip. |
| `upper`  | `float` | `--`    | The upper bound.    |
| `lower`  | `float` | `--`    | The lower bound.    |

**Returns**:

| Type    | Description         |
| ------- | ------------------- |
| `float` | The clipped number. |

## `get_log_nb`

```python theme={null}
get_log_nb(
    number: float,
    epsilon: float = 1e-10,
) ‑> float
```

Compute the logarithm of a number, avoiding NaN and infinity.

If the input is NaN or infinity, returns a small negative value.

**Parameters**:

| Name      | Type    | Default | Description                                                  |
| --------- | ------- | ------- | ------------------------------------------------------------ |
| `number`  | `float` | `--`    | The input number.                                            |
| `epsilon` | `float` | `1e-10` | A small positive constant to avoid log(0), default is 1e-10. |

**Returns**:

| Type    | Description                                            |
| ------- | ------------------------------------------------------ |
| `float` | The logarithm of the number or a fallback small value. |

## `get_log_diff_nb`

```python theme={null}
get_log_diff_nb(
    arr: numpy.ndarray,
) ‑> numpy.ndarray
```

Compute log difference for 2-dimensional arrays.

**Parameters**:

| Name  | Type         | Default | Description          |
| ----- | ------------ | ------- | -------------------- |
| `arr` | `tp.Array2d` | `--`    | 2-dimensional array. |

**Returns**:

| Type         | Description               |
| ------------ | ------------------------- |
| `tp.Array1d` | Arrays of log difference. |

## `get_diff_nb`

```python theme={null}
get_diff_nb(
    arr: numpy.ndarray,
) ‑> numpy.ndarray
```

Compute difference for 2-dimensional arrays.

**Parameters**:

| Name  | Type         | Default | Description          |
| ----- | ------------ | ------- | -------------------- |
| `arr` | `tp.Array2d` | `--`    | 2-dimensional array. |

**Returns**:

| Type         | Description               |
| ------------ | ------------------------- |
| `tp.Array1d` | Arrays of log difference. |

## `zscore_nb`

```python theme={null}
zscore_nb(
    arr: numpy.ndarray,
    ddof: int = 1,
) ‑> float
```

Compute the z-score of the last element relative to the entire array.

**Parameters**:

| Name   | Type         | Default | Description                                                                  |
| ------ | ------------ | ------- | ---------------------------------------------------------------------------- |
| `arr`  | `tp.Array1d` | `--`    | A 2D NumPy array.                                                            |
| `ddof` | `int`        | `1`     | Delta degrees of freedom for standard deviation calculation, default is `1`. |

**Returns**:

| Type    | Description                      |
| ------- | -------------------------------- |
| `float` | The z-score of the last element. |

## `get_rolling_zscore_nb`

```python theme={null}
get_rolling_zscore_nb(
    arr: numpy.ndarray,
    window: int,
    minp: int = None,
    ddof: int = 1,
) ‑> numpy.ndarray
```

Compute rolling z-scores over a given window.

**Parameters**:

| Name     | Type         | Default | Description                                                                |
| -------- | ------------ | ------- | -------------------------------------------------------------------------- |
| `arr`    | `tp.Array2d` | `--`    | A 2D NumPy array.                                                          |
| `window` | `int`        | `--`    | The rolling window size.                                                   |
| `minp`   | `int`        | `--`    | Minimum number of observations required.                                   |
| `ddof`   | `int`        | `1`     | Delta degrees of freedom for standard deviation calculation, default is 1. |

**Returns**:

| Type         | Description                     |
| ------------ | ------------------------------- |
| `tp.Array2d` | A 2D array of rolling z-scores. |

## `get_rolling_mean_1d_nb`

```python theme={null}
get_rolling_mean_1d_nb(
    arr: numpy.ndarray,
    window: int,
    minp: int = None,
) ‑> numpy.ndarray
```

Compute the rolling mean of a one-dimensional array.

Uses `rolling_mean_acc_nb` at each iteration.

Numba equivalent to `pd.Series(arr).rolling(window, min_periods=minp).mean()`.

**Parameters**:

| Name     | Type         | Default | Description                              |
| -------- | ------------ | ------- | ---------------------------------------- |
| `arr`    | `tp.Array1d` | `--`    | One-dimensional array of numeric data.   |
| `window` | `int`        | `--`    | Window size.                             |
| `minp`   | `int`        | `--`    | Minimum number of observations required. |

**Returns**:

| Type         | Description                         |
| ------------ | ----------------------------------- |
| `tp.Array1d` | Array containing the rolling means. |

## `get_rolling_sum_1d_nb`

```python theme={null}
get_rolling_sum_1d_nb(
    arr: numpy.ndarray,
    window: int,
    minp: int = None,
) ‑> numpy.ndarray
```

Compute rolling sum for a one-dimensional array.

Uses `rolling_sum_acc_nb` to update the accumulation state for each iteration,
emulating the behavior of `pd.Series(arr).rolling(window, min_periods=minp).sum()`.

**Parameters**:

| Name     | Type         | Default | Description                              |
| -------- | ------------ | ------- | ---------------------------------------- |
| `arr`    | `tp.Array1d` | `--`    | One-dimensional array of numeric data.   |
| `window` | `int`        | `--`    | Window size.                             |
| `minp`   | `int`        | `--`    | Minimum number of observations required. |

**Returns**:

| Type         | Description                        |
| ------------ | ---------------------------------- |
| `tp.Array1d` | Array containing the rolling sums. |

## `get_rolling_ols_zscore_nb`

```python theme={null}
get_rolling_ols_zscore_nb(
    arr: numpy.ndarray,
    window: str | float | int | complex | bool | object | numpy.generic | numpy.ndarray = 14,
    norm_window: str | float | int | complex | bool | object | numpy.generic | numpy.ndarray = None,
    minp: int = None,
    ddof: int = 1,
) ‑> numpy.ndarray
```

Compute rolling ordinary least squares (OLS) regression zscore for 2-dimensional arrays.

This function applies a 1-dimensional OLS regression on each column.

**Parameters**:

| Name          | Type              | Default | Description                                                                                            |
| ------------- | ----------------- | ------- | ------------------------------------------------------------------------------------------------------ |
| `arr`         | `tp.Array2d`      | `--`    | 2-dimensional array.                                                                                   |
| `window`      | `FlexArray1dLike` | `--`    | Window size. Provided as a scalar or per column.                                                       |
| `norm_window` | `FlexArray1dLike` | `--`    | Window size for error normalization. Provided as a scalar or per column. Defaults to `window` if None. |
| `minp`        | `int`             | `--`    | Minimum number of observations required.                                                               |
| `ddof`        | `int`             | `--`    | Delta degrees of freedom.                                                                              |

**Returns**:

| Type         | Description                         |
| ------------ | ----------------------------------- |
| `tp.Array1d` | Arrays of z-scores for each column. |

## `get_rolling_ols_pred_nb`

```python theme={null}
get_rolling_ols_pred_nb(
    arr: numpy.ndarray,
    window: str | float | int | complex | bool | object | numpy.generic | numpy.ndarray = 14,
    norm_window: str | float | int | complex | bool | object | numpy.generic | numpy.ndarray | None = None,
    minp: int | None = None,
    ddof: int = 1,
) ‑> numpy.ndarray
```

Compute OLS regression errors for 2-dimensional arrays.

This function applies error computation column-wise.

**Parameters**:

| Name          | Type              | Default | Description                                                                                            |
| ------------- | ----------------- | ------- | ------------------------------------------------------------------------------------------------------ |
| `arr`         | `tp.Array2d`      | `--`    | 2-dimensional array.                                                                                   |
| `window`      | `FlexArray1dLike` | `--`    | Window size. Provided as a scalar or per column.                                                       |
| `norm_window` | `FlexArray1dLike` | `--`    | Window size for error normalization. Provided as a scalar or per column. Defaults to `window` if None. |
| `minp`        | `int`             | `--`    | Minimum number of observations required.                                                               |
| `ddof`        | `int`             | `--`    | Delta degrees of freedom.                                                                              |

**Returns**:

| Type         | Description                         |
| ------------ | ----------------------------------- |
| `tp.Array1d` | Arrays of z-scores for each column. |

## `get_rolling_ols_residual_nb`

```python theme={null}
get_rolling_ols_residual_nb(
    arr: numpy.ndarray,
    window: str | float | int | complex | bool | object | numpy.generic | numpy.ndarray = 14,
    norm_window: str | float | int | complex | bool | object | numpy.generic | numpy.ndarray | None = None,
    minp: int | None = None,
    ddof: int = 1,
) ‑> numpy.ndarray
```

Compute OLS regression errors for 2-dimensional arrays.

This function applies error computation column-wise.

**Parameters**:

| Name          | Type              | Default | Description                                                                                            |
| ------------- | ----------------- | ------- | ------------------------------------------------------------------------------------------------------ |
| `arr`         | `tp.Array2d`      | `--`    | 2-dimensional array.                                                                                   |
| `window`      | `FlexArray1dLike` | `--`    | Window size. Provided as a scalar or per column.                                                       |
| `norm_window` | `FlexArray1dLike` | `--`    | Window size for error normalization. Provided as a scalar or per column. Defaults to `window` if None. |
| `minp`        | `int`             | `--`    | Minimum number of observations required.                                                               |
| `ddof`        | `int`             | `--`    | Delta degrees of freedom.                                                                              |

**Returns**:

| Type         | Description                         |
| ------------ | ----------------------------------- |
| `tp.Array1d` | Arrays of z-scores for each column. |

## `get_rolling_deviation_nb`

```python theme={null}
get_rolling_deviation_nb(
    arr: numpy.ndarray,
    window: int = 1,
    minp: int = None,
)
```

Compute the rolling deviation of an array.

**Parameters**:

| Name  | Type         | Default | Description       |
| ----- | ------------ | ------- | ----------------- |
| `arr` | `tp.Array1d` | `--`    | A 2D NumPy array. |

**Returns**:

| Type    | Description                     |
| ------- | ------------------------------- |
| `float` | The computed rolling deviation. |

## `get_rolling_std_nb`

```python theme={null}
get_rolling_std_nb(
    arr: numpy.ndarray,
    window: int,
    ddof: int = 1,
) ‑> numpy.ndarray
```

Compute rolling standard deviation over a given window.

**Parameters**:

| Name     | Type         | Default | Description                                                                  |
| -------- | ------------ | ------- | ---------------------------------------------------------------------------- |
| `arr`    | `tp.Array2d` | `--`    | A 2D NumPy array.                                                            |
| `window` | `int`        | `--`    | The rolling window size.                                                     |
| `ddof`   | `int`        | `1`     | Delta degrees of freedom for standard deviation calculation, default is `1`. |

**Returns**:

| Type         | Description                                |
| ------------ | ------------------------------------------ |
| `tp.Array2d` | A 2D array of rolling standard deviations. |

## `get_ecdf`

```python theme={null}
get_ecdf(
    arr: numpy.ndarray,
) ‑> Type
```

Create an empirical cumulative distribution function (ECDF).

**Parameters**:

| Name  | Type         | Default | Description                      |
| ----- | ------------ | ------- | -------------------------------- |
| `arr` | `tp.Array1d` | `--`    | A 1D NumPy array of data points. |

**Returns**:

| Type    | Description        |
| ------- | ------------------ |
| `_ECDF` | An `_ECDF` object. |

## `get_weighted_average_nb`

```python theme={null}
get_weighted_average_nb(
    scores: numpy.ndarray,
    weights: numpy.ndarray,
    axis: int,
)
```

Compute the weighted average along the specified axis.

**Parameters**:

| Name      | Type       | Default | Description                                                                   |
| --------- | ---------- | ------- | ----------------------------------------------------------------------------- |
| `scores`  | `tp.Array` | `--`    | A NumPy array of scores.                                                      |
| `weights` | `tp.Array` | `--`    | A NumPy array of corresponding weights.                                       |
| `axis`    | `int`      | `--`    | The axis along which to compute the weighted average (0 = columns, 1 = rows). |

**Raises**:

| Type         | Description                |
| ------------ | -------------------------- |
| `ValueError` | If the axis is not 0 or 1. |

**Returns**:

| Type       | Description                                             |
| ---------- | ------------------------------------------------------- |
| `tp.Array` | The computed weighted average along the specified axis. |

## `get_cumulative_index_nb`

```python theme={null}
get_cumulative_index_nb(
    model_output: numpy.ndarray,
    center: float = 0.5,
) ‑> numpy.ndarray
```

Calculate cumulative index.

This function computes the cumulative sum of a prediction array while
subtracting the given center value, then iteratively accumulating the
results.

**Parameters**:

| Name           | Type         | Default | Description                                                          |
| -------------- | ------------ | ------- | -------------------------------------------------------------------- |
| `model_output` | `tp.Array2d` | `--`    | The array of predictions to calculate the cumulative index.          |
| `center`       | `float`      | `0.5`   | The center value to subtract from each prediction. Default is `0.5`. |

**Returns**:

| Type         | Description                       |
| ------------ | --------------------------------- |
| `tp.Array2d` | The cumulative index as an array. |

## `get_reset_index_nb`

```python theme={null}
get_reset_index_nb(
    model_output: numpy.ndarray,
    lower: float = -1.0,
    upper: float = 1.0,
    center: float = 0.5,
) ‑> numpy.ndarray
```

Calculate reset index.

This function computes the reset index, which resets the accumulated sum
if it exceeds the specified upper or lower bounds.

**Parameters**:

| Name           | Type         | Default | Description                                                           |
| -------------- | ------------ | ------- | --------------------------------------------------------------------- |
| `model_output` | `tp.Array2d` | `--`    | The array of predictions to calculate the reset index.                |
| `lower`        | `float`      | `-1.0`  | The lower bound for resetting the accumulated sum. Default is `-1.0`. |
| `upper`        | `float`      | `1.0`   | The upper bound for resetting the accumulated sum. Default is `1.0`.  |
| `center`       | `float`      | `0.5`   | The center value to subtract from each prediction. Default is `0.5`.  |

**Returns**:

| Type         | Description                  |
| ------------ | ---------------------------- |
| `tp.Array2d` | The reset index as an array. |

## `get_clip_index_nb`

```python theme={null}
get_clip_index_nb(
    model_output: numpy.ndarray,
    mask: numpy.ndarray = None,
    lower: float = -1.0,
    upper: float = 1.0,
    center: float = 0.5,
    bound_reversion: bool = False,
) ‑> numpy.ndarray
```

Calculate clip index.

This function clips the accumulated sum of predictions within the specified
lower and upper bounds.

NaN values are replaced with previous valid values during processing.
At the end, NaN values are reapplied to the output where they appeared in the input.

The optional `mask` parameter is a 1D array selecting valid values across all columns.
If mask value is True, values are replaced with previous valid values.
Otherwise, accumulated sum of predictions. If None, no masking is applied.

if `bound_reversion` is set to `True`, it compute the cumulative sum of negative (positive)
values toward the center, ignoring mask, uppon reaching the upper or lower bound respectively.
This technique increases the speed of reversion toward neutrality. Defaults to `False`.

**Parameters**:

| Name              | Type         | Default | Description                                                                                                                                                                                                                                                                        |
| ----------------- | ------------ | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model_output`    | `tp.Array2d` | `--`    | The array of predictions to calculate the clipped index. The third column is a mask to select valid values. If mask value is True, values are replaced with previous valid values. Otherwise, accumulated sum of predictions.                                                      |
| `mask`            | `tp.Array1d` | `None`  | Mask array (1D) to select valid values. If True, replace with previous valid value. Defaults to None.                                                                                                                                                                              |
| `lower`           | `float`      | `-1.0`  | The lower bound for clipping the accumulated sum. Default is `-1.0`.                                                                                                                                                                                                               |
| `upper`           | `float`      | `1.0`   | The upper bound for clipping the accumulated sum. Default is `1.0`.                                                                                                                                                                                                                |
| `center`          | `float`      | `0.5`   | The center value to subtract from each prediction. Default is `0.5`.                                                                                                                                                                                                               |
| `bound_reversion` | `bool`       | `False` | if `speed_reversion` is set to `True`, it compute the cumulative sum of negative (positive) values toward the center, ignoring mask, uppon reaching the upper or lower bound respectively. This technique increases the speed of reversion toward neutrality. Defaults to `False`. |

**Returns**:

| Type         | Description                    |
| ------------ | ------------------------------ |
| `tp.Array2d` | The clipped index as an array. |

**Raises**:

| Type         | Description                                                                             |
| ------------ | --------------------------------------------------------------------------------------- |
| `ValueError` | If mask is not None and its length does not match the number of rows in `model_output`. |

## `get_scaled_zscore`

```python theme={null}
get_scaled_zscore(
    zscore: numpy.ndarray | pandas.core.series.Series | pandas.core.frame.DataFrame,
) ‑> numpy.ndarray | pandas.core.series.Series | pandas.core.frame.DataFrame
```

Map a z-score (which ranges from $-\infty$ to $+\infty$) to a value in
the interval $[-1,1]$ using the error function. The key idea is based
on the fact that the cumulative distribution function (CDF) of the standard
normal distribution is:

$$
\phi(z) = \frac{1}{2} \left( 1 + \operatorname{erf} \left( \frac{z}{\sqrt{2}} \right) \right)
$$

To scale this to $[-1,1]$, you can transform it as:

$$
\text{transformed value} = 2\Phi(z) - 1 = \operatorname{erf}\left(\frac{z}{\sqrt{2}}\right)
$$

where erf is the error function.

This expression, $\operatorname{erf}\left(z/\sqrt{2}\right)$, naturally yields
values between $-1$ (for $z \to -\infty$) and $1$ (for $z \to +\infty$),
with $0$ corresponding to $z=0$.

**Parameters**:

| Name     | Type       | Default | Description     |
| -------- | ---------- | ------- | --------------- |
| `zscore` | `tp.Array` | `--`    | Z-score values. |

**Returns**:

| Type                         | Description |
| ---------------------------- | ----------- |
| `tp.Array \| tp.SeriesFrame` | Uniform.    |

## `get_ellipse`

```python theme={null}
get_ellipse(
    arr: numpy.ndarray,
    std: float = 2.0,
) ‑> numpy.ndarray
```

Create ellipse.

**Parameters**:

| Name  | Type         | Default | Description                                                                                   |
| ----- | ------------ | ------- | --------------------------------------------------------------------------------------------- |
| `arr` | `tp.Array2d` | `--`    | Input data of shape (n\_samples, 2) representing the 2D points.                               |
| `std` | `float`      | `2.0`   | Standard deviation threshold defining the size of the ellipse (e.g., 2 for \~95% confidence). |

**Returns**:

| Type        | Description |
| ----------- | ----------- |
| `tp.Array2` | Ellipse.    |