Statistical Moments Module

Robust, shrinkage-aware estimators for the first two moments of asset returns reside in pyvallocation.moments. The functions below preserve pandas indices when provided and are designed to mix seamlessly with the portfolio API, Bayesian views, and optimisation engines. All algorithms are extensively unit-tested (see tests/test_moments_shrinkage.py) to guarantee robustness, PSD outputs, and consistent labelling.

Quick Start

Moment estimation pipeline

import pandas as pd
from pyvallocation.moments import (
    estimate_moments,
    shrink_covariance_nls,
    shrink_mean_james_stein,
)

returns = pd.read_csv("returns.csv", index_col=0, parse_dates=True)
mu_js, sigma_nls = estimate_moments(
    returns,
    mean_estimator="james_stein",
    cov_estimator="nls",
)

# Alternatively call estimators directly
sigma_nls_direct = shrink_covariance_nls(returns)
mu_js_direct = shrink_mean_james_stein(
    returns.mean(),
    sigma_nls_direct,
    T=len(returns),
)

Mean Estimators

pyvallocation.moments.estimate_sample_moments(R, p)[source]

Estimates the weighted mean vector and covariance matrix from scenarios.

This function computes the first two statistical moments (mean and covariance) of asset returns, given a set of scenarios and their associated probabilities. The scenarios R represent different possible outcomes for asset returns, and p represents the probability of each scenario.

Parameters:

R (ArrayLike) – A 2D array-like object (e.g., numpy.ndarray, pandas.DataFrame) of shape (T, N), where T is the number of scenarios/observations and N is the number of assets. Each row represents a scenario of asset returns.
p (ArrayLike) – A 1D array-like object (e.g., numpy.ndarray, pandas.Series) of shape (T,), representing the probabilities associated with each scenario in R. These probabilities must be non-negative and sum to one.

Returns:

Tuple[ArrayLike, ArrayLike] –

A tuple containing:

mu (ArrayLike): The weighted mean vector of asset returns. If R or p were pandas objects, mu will be a pandas.Series.
S (ArrayLike): The weighted covariance matrix of asset returns. If R or p were pandas objects, S will be a pandas.DataFrame.

Raises:

ValueError – If p has a length mismatch with R, or if p contains negative values or does not sum to one.

Parameters:

R (numpy.ndarray | pandas.Series | pandas.DataFrame)
p (numpy.ndarray | pandas.Series | pandas.DataFrame)

Return type:

Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]

pyvallocation.moments.shrink_mean_jorion(mu, S, T)[source]

Applies Bayes-Stein shrinkage to the mean vector as in Jorion [Jorion, 1986].

This shrinkage estimator aims to improve the out-of-sample performance of mean estimates, especially when the number of assets (N) is large relative to the number of observations (T). It shrinks the sample mean towards a common mean (e.g., the global minimum variance portfolio mean).

Parameters:

mu (ArrayLike) – The sample mean vector (1D array-like, length N). Can be a numpy.ndarray or pandas.Series.
S (ArrayLike) – The sample covariance matrix (2D array-like, NxN). Can be a numpy.ndarray or pandas.DataFrame.
T (int) – The number of observations (scenarios) used to estimate mu and S.

Returns:

ArrayLike – The Bayes-Stein shrunk mean vector. If mu was a pandas.Series, the output will also be a pandas.Series.

Raises:

ValueError – If input dimensions are invalid (e.g., T <= 0, N <= 2, or S shape mismatch), or if the covariance matrix S is singular.

Parameters:

mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
S (numpy.ndarray | pandas.Series | pandas.DataFrame)
T (int)

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

Notes

A small jitter (1e-8 * identity matrix) is added to S before inversion to handle potential singularity issues. The shrinkage intensity v is clipped between 0 and 1 to ensure a valid shrinkage factor.

pyvallocation.moments.shrink_mean_james_stein(mu_hat, S, T, target='grand_mean')[source]

Return James-Stein shrinkage estimate for the mean vector.

Parameters:

mu_hat – Sample mean vector.
S – Sample covariance matrix.
T – Number of observations.
target – Shrinkage target ("grand_mean" or custom vector).

Returns:

ArrayLike – Shrunk mean estimate.

Parameters:

mu_hat (ArrayLike)
S (ArrayLike)
T (int)
target (str | np.ndarray | pd.Series)

Return type:

ArrayLike

References

[Jorion, 1986]

pyvallocation.moments.robust_mean_huber(R, *, allow_vectorized=True, tol=1e-06, max_iter=200)[source]

Return adaptive Huber mean estimator (per asset) for heavy-tailed data.

Parameters:

R – Scenario matrix with shape (T, N).
allow_vectorized – Must be True (scalar mode not implemented).
tol – Relative convergence tolerance.
max_iter – Maximum number of iterations per asset.

Returns:

ArrayLike – Robust mean vector.

Parameters:

R (numpy.ndarray | pandas.Series | pandas.DataFrame)
allow_vectorized (bool)
tol (float)
max_iter (int)

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

References

[Huber, 1964]

pyvallocation.moments.robust_mean_median_of_means(R, *, n_blocks='auto', random_state=None)[source]

Return coordinate-wise Median-of-Means mean estimator.

Parameters:

R – Scenario matrix with shape (T, N).
n_blocks – Number of blocks or "auto" to use ceil(sqrt(T)).
random_state – Optional random seed or Generator.

Returns:

ArrayLike – Robust mean vector.

Parameters:

R (ArrayLike)
n_blocks (int | str)
random_state (Optional[int | np.random.Generator])

Return type:

ArrayLike

Covariance Estimators

pyvallocation.moments.shrink_covariance_ledoit_wolf(R, S_hat, target='identity')[source]

Applies the Ledoit-Wolf shrinkage estimator for the covariance matrix [Ledoit and Wolf, 2004].

This estimator provides a well-conditioned covariance matrix, especially useful when the number of observations is small relative to the number of assets, or when the sample covariance matrix is ill-conditioned. It shrinks the sample covariance matrix towards a structured target matrix.

Parameters:

R (ArrayLike) – A 2D array-like object (e.g., numpy.ndarray, pandas.DataFrame) of shape (T, N), where T is the number of observations and N is the number of assets. These are the returns data.
S_hat (ArrayLike) – The sample covariance matrix (2D array-like, NxN). Can be a numpy.ndarray or pandas.DataFrame.
target (str, optional) – The shrinkage target. - "identity": Shrinks towards a scaled identity matrix. - "constant_correlation": Shrinks towards a constant correlation matrix. Defaults to "identity".

Returns:

ArrayLike – The shrunk covariance matrix. If R or S_hat were pandas objects, the output will be a pandas.DataFrame.

Raises:

ValueError – If input dimensions are invalid (e.g., T = 0, or S_hat shape mismatch), or if an unsupported target is specified.

Parameters:

R (numpy.ndarray | pandas.Series | pandas.DataFrame)
S_hat (numpy.ndarray | pandas.Series | pandas.DataFrame)
target (str)

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

Notes

The function calculates various components of the Ledoit-Wolf formula:

F: The target matrix.
pi_mat, pi_hat, diag_pi, off_pi, rho_hat: Components related to the estimation of the optimal shrinkage intensity.
gamma_hat: The squared Frobenius norm of the difference between the sample covariance and the target matrix.
kappa: Intermediate value for shrinkage intensity.
delta: The optimal shrinkage intensity, clipped between 0 and 1.

The final shrunk covariance matrix is ensured to be positive semi-definite using ensure_psd_matrix.

pyvallocation.moments.shrink_covariance_oas(R, assume_centered=True)[source]

Return the Oracle Approximating Shrinkage (OAS) covariance estimator.

Parameters:

R – Scenario matrix with shape (T, N).
assume_centered – If True treat data as centered. Defaults to True.

Returns:

ArrayLike – Shrunk covariance matrix (pandas DataFrame when labels are available).

Parameters:

R (numpy.ndarray | pandas.Series | pandas.DataFrame)
assume_centered (bool)

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

References

[Chen et al., 2010]

pyvallocation.moments.shrink_covariance_nls(R_or_S, *, input_is_cov=False, dof_correction=0)[source]

Return Ledoit-Wolf analytical nonlinear shrinkage (QuEST) of covariance.

Parameters:

R_or_S – Scenario matrix with shape (T, N) (raw returns).
input_is_cov – Reserved for compatibility; must remain False. Defaults to False.
dof_correction – Degrees-of-freedom correction applied to the sample covariance.

Returns:

ArrayLike – Shrunk covariance matrix.

Parameters:

R_or_S (numpy.ndarray | pandas.Series | pandas.DataFrame)
input_is_cov (bool)
dof_correction (int)

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

References

[Ledoit and Wolf, 2020]

pyvallocation.moments.factor_covariance_poet(R, k='auto', thresh='auto', standardize=True, return_decomp=False)[source]

Return POET low-rank plus sparse covariance estimator.

Parameters:

R – Scenario matrix with shape (T, N).
k – Number of factors or "auto" to pick via eigen-gap.
thresh – Threshold for the sparse residual ("auto" uses a heuristic).
standardize – Whether to standardize returns before decomposition.
return_decomp – If True return factor loadings and factor scores.

Returns:

ArrayLike or tuple – Covariance estimate, and optionally factor loadings/scores.

Parameters:

R (ArrayLike)
k (int | str)
thresh (float | str)
standardize (bool)
return_decomp (bool)

Return type:

ArrayLike | Tuple[ArrayLike, ArrayLike, ArrayLike]

References

[Fan et al., 2013]

pyvallocation.moments.robust_covariance_tyler(R, *, shrinkage=0.0, target='identity', tol=1e-06, max_iter=200, ensure_psd=True)[source]

Return regularised Tyler’s M-estimator for heavy-tailed covariance.

Parameters:

R – Scenario matrix with shape (T, N).
shrinkage – Shrinkage intensity toward target in [0, 1].
target – Target covariance matrix or "identity".
tol – Relative convergence tolerance for the fixed-point iteration.
max_iter – Maximum number of iterations.
ensure_psd – Whether to project the result to PSD.

Returns:

ArrayLike – Robust covariance matrix.

Parameters:

R (ArrayLike)
shrinkage (float)
target (str | np.ndarray | pd.DataFrame)
tol (float)
max_iter (int)
ensure_psd (bool)

Return type:

ArrayLike

References

[Tyler, 1987]

pyvallocation.moments.sparse_precision_glasso(R, *, alpha='auto', assume_centered=True, return_precision=False)[source]

Estimate covariance via sparse inverse covariance (Graphical Lasso).

Parameters:

R – Scenario matrix with shape (T, N).
alpha – Penalty parameter or "auto" to cross-validate.
assume_centered – If False center the data before estimation.
return_precision – If True also return the precision matrix.

Returns:

ArrayLike or tuple – Covariance estimate (and precision if requested).

Parameters:

R (ArrayLike)
alpha (float | str)
assume_centered (bool)
return_precision (bool)

Return type:

ArrayLike | Tuple[ArrayLike, ArrayLike]

References

[Friedman et al., 2008]

Implements an ADMM-based solver with cross-validated penalty selection.

Bayesian Posterior Adapters

pyvallocation.moments.posterior_moments_black_litterman(*, prior_cov, prior_mean=None, market_weights=None, risk_aversion=1.0, tau=0.05, mean_views=None, view_confidences=None, omega='idzorek', **kwargs)[source]

Return posterior (mu, Sigma) from BlackLittermanProcessor.

Parameters:

prior_cov – Prior covariance matrix.
prior_mean – Optional prior mean vector.
market_weights – Optional market-cap weights for implied equilibrium mean.
risk_aversion – Risk-aversion coefficient (defaults to 1.0).
tau – Prior covariance shrinkage parameter (defaults to 0.05).
mean_views – Mean views (absolute or relative).
view_confidences – Confidence levels for views (Idzorek).
omega – View covariance ("idzorek" or array-like).
**kwargs – Additional arguments forwarded to BlackLittermanProcessor.

Returns:

Tuple[ArrayLike, ArrayLike] – Posterior mean and covariance.

Parameters:

prior_cov (numpy.ndarray | pandas.Series | pandas.DataFrame)
prior_mean (numpy.ndarray | pandas.Series | pandas.DataFrame | None)
market_weights (numpy.ndarray | pandas.Series | pandas.DataFrame | None)
risk_aversion (float)
tau (float)
mean_views (Any | None)
view_confidences (Any | None)
omega (Any)
kwargs (Any)

Return type:

Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]

References

[Black and Litterman, 1992]

pyvallocation.moments.posterior_moments_niw(*, prior_mu, prior_sigma, t0, nu0, sample_mu, sample_sigma, n_obs)[source]

Return NIW posterior classical-equivalent (mu, Sigma).

Parameters:

prior_mu – Prior mean vector.
prior_sigma – Prior covariance matrix.
t0 – Prior strength (pseudo-observations for mean).
nu0 – Prior degrees of freedom for covariance.
sample_mu – Sample mean vector.
sample_sigma – Sample covariance matrix.
n_obs – Number of observations.

Returns:

Tuple[ArrayLike, ArrayLike] – Posterior mean and covariance.

Parameters:

prior_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
prior_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
t0 (int)
nu0 (int)
sample_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
sample_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
n_obs (int)

Return type:

Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]

pyvallocation.moments.posterior_moments_niw_with_uncertainty(*, prior_mu, prior_sigma, t0, nu0, sample_mu, sample_sigma, n_obs)[source]

Return NIW posterior moments plus mean-uncertainty covariance.

The returned S_mu corresponds to the NIW mean uncertainty [Meucci, 2005]:

\[S_\mu = \frac{\nu_1}{T_1 (\nu_1 - 2)} \Sigma_1.\]

Parameters:

prior_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
prior_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
t0 (int)
nu0 (int)
sample_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
sample_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
n_obs (int)

Return type:

The posterior_moments_niw_with_uncertainty helper returns both classical posterior moments and the NIW mean-uncertainty covariance \(S_\\mu\), which is required for robust Bayesian optimisation.

Composite Factory

pyvallocation.moments.estimate_moments(R, p=None, *, mean_estimator='sample', cov_estimator='sample', mean_kwargs=None, cov_kwargs=None)[source]

Return (mu, Sigma) using configurable mean and covariance estimators.

Parameters:

R – Scenario matrix with shape (T, N).
p – Optional scenario probabilities aligned with R.
mean_estimator – Mean estimator key (default "sample").
cov_estimator – Covariance estimator key (default "sample").
mean_kwargs – Optional keyword arguments for the mean estimator.
cov_kwargs – Optional keyword arguments for the covariance estimator.

Returns:

Tuple[ArrayLike, ArrayLike] – Estimated mean and covariance.

Parameters:

R (numpy.ndarray | pandas.Series | pandas.DataFrame)
p (numpy.ndarray | pandas.Series | pandas.DataFrame | None)
mean_estimator (str)
cov_estimator (str)
mean_kwargs (Dict[str, Any] | None)
cov_kwargs (Dict[str, Any] | None)

Return type:

Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]

Statistical Moments Module

Quick Start

Mean Estimators

Covariance Estimators

Bayesian Posterior Adapters

Composite Factory

Related Modules