Statistical Moments Module
Robust, shrinkage-aware estimators for the first two moments of asset returns
reside in pyvallocation.moments. The functions below preserve pandas
indices when provided and are designed to mix seamlessly with the portfolio
API, Bayesian views, and optimisation engines. All algorithms are extensively
unit-tested (see tests/test_moments_shrinkage.py) to guarantee robustness,
PSD outputs, and consistent labelling.
Quick Start
import pandas as pd
from pyvallocation.moments import (
estimate_moments,
shrink_covariance_nls,
shrink_mean_james_stein,
)
returns = pd.read_csv("returns.csv", index_col=0, parse_dates=True)
mu_js, sigma_nls = estimate_moments(
returns,
mean_estimator="james_stein",
cov_estimator="nls",
)
# Alternatively call estimators directly
sigma_nls_direct = shrink_covariance_nls(returns)
mu_js_direct = shrink_mean_james_stein(
returns.mean(),
sigma_nls_direct,
T=len(returns),
)
Mean Estimators
- pyvallocation.moments.estimate_sample_moments(R, p)[source]
Estimates the weighted mean vector and covariance matrix from scenarios.
This function computes the first two statistical moments (mean and covariance) of asset returns, given a set of scenarios and their associated probabilities. The scenarios R represent different possible outcomes for asset returns, and p represents the probability of each scenario.
- Parameters:
R (ArrayLike) – A 2D array-like object (e.g.,
numpy.ndarray,pandas.DataFrame) of shape (T, N), where T is the number of scenarios/observations and N is the number of assets. Each row represents a scenario of asset returns.p (ArrayLike) – A 1D array-like object (e.g.,
numpy.ndarray,pandas.Series) of shape (T,), representing the probabilities associated with each scenario in R. These probabilities must be non-negative and sum to one.
- Returns:
Tuple[ArrayLike, ArrayLike] –
- A tuple containing:
mu (ArrayLike): The weighted mean vector of asset returns. If R or p were pandas objects, mu will be a
pandas.Series.S (ArrayLike): The weighted covariance matrix of asset returns. If R or p were pandas objects, S will be a
pandas.DataFrame.
- Raises:
ValueError – If p has a length mismatch with R, or if p contains negative values or does not sum to one.
- Parameters:
R (numpy.ndarray | pandas.Series | pandas.DataFrame)
p (numpy.ndarray | pandas.Series | pandas.DataFrame)
- Return type:
Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]
- pyvallocation.moments.shrink_mean_jorion(mu, S, T)[source]
Applies Bayes-Stein shrinkage to the mean vector as in Jorion [Jorion, 1986].
This shrinkage estimator aims to improve the out-of-sample performance of mean estimates, especially when the number of assets (N) is large relative to the number of observations (T). It shrinks the sample mean towards a common mean (e.g., the global minimum variance portfolio mean).
- Parameters:
mu (ArrayLike) – The sample mean vector (1D array-like, length N). Can be a
numpy.ndarrayorpandas.Series.S (ArrayLike) – The sample covariance matrix (2D array-like, NxN). Can be a
numpy.ndarrayorpandas.DataFrame.T (int) – The number of observations (scenarios) used to estimate mu and S.
- Returns:
ArrayLike – The Bayes-Stein shrunk mean vector. If mu was a
pandas.Series, the output will also be apandas.Series.- Raises:
ValueError – If input dimensions are invalid (e.g., T <= 0, N <= 2, or S shape mismatch), or if the covariance matrix S is singular.
- Parameters:
mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
S (numpy.ndarray | pandas.Series | pandas.DataFrame)
T (int)
- Return type:
numpy.ndarray | pandas.Series | pandas.DataFrame
Notes
A small jitter (1e-8 * identity matrix) is added to S before inversion to handle potential singularity issues. The shrinkage intensity v is clipped between 0 and 1 to ensure a valid shrinkage factor.
- pyvallocation.moments.shrink_mean_james_stein(mu_hat, S, T, target='grand_mean')[source]
Return James-Stein shrinkage estimate for the mean vector.
- Parameters:
mu_hat – Sample mean vector.
S – Sample covariance matrix.
T – Number of observations.
target – Shrinkage target (
"grand_mean"or custom vector).
- Returns:
ArrayLike – Shrunk mean estimate.
- Parameters:
mu_hat (ArrayLike)
S (ArrayLike)
T (int)
target (str | np.ndarray | pd.Series)
- Return type:
ArrayLike
References
- pyvallocation.moments.robust_mean_huber(R, *, allow_vectorized=True, tol=1e-06, max_iter=200)[source]
Return adaptive Huber mean estimator (per asset) for heavy-tailed data.
- Parameters:
R – Scenario matrix with shape
(T, N).allow_vectorized – Must be
True(scalar mode not implemented).tol – Relative convergence tolerance.
max_iter – Maximum number of iterations per asset.
- Returns:
ArrayLike – Robust mean vector.
- Parameters:
R (numpy.ndarray | pandas.Series | pandas.DataFrame)
allow_vectorized (bool)
tol (float)
max_iter (int)
- Return type:
numpy.ndarray | pandas.Series | pandas.DataFrame
References
- pyvallocation.moments.robust_mean_median_of_means(R, *, n_blocks='auto', random_state=None)[source]
Return coordinate-wise Median-of-Means mean estimator.
- Parameters:
R – Scenario matrix with shape
(T, N).n_blocks – Number of blocks or
"auto"to useceil(sqrt(T)).random_state – Optional random seed or Generator.
- Returns:
ArrayLike – Robust mean vector.
- Parameters:
R (ArrayLike)
n_blocks (int | str)
random_state (Optional[int | np.random.Generator])
- Return type:
ArrayLike
Covariance Estimators
- pyvallocation.moments.shrink_covariance_ledoit_wolf(R, S_hat, target='identity')[source]
Applies the Ledoit-Wolf shrinkage estimator for the covariance matrix [Ledoit and Wolf, 2004].
This estimator provides a well-conditioned covariance matrix, especially useful when the number of observations is small relative to the number of assets, or when the sample covariance matrix is ill-conditioned. It shrinks the sample covariance matrix towards a structured target matrix.
- Parameters:
R (ArrayLike) – A 2D array-like object (e.g.,
numpy.ndarray,pandas.DataFrame) of shape (T, N), where T is the number of observations and N is the number of assets. These are the returns data.S_hat (ArrayLike) – The sample covariance matrix (2D array-like, NxN). Can be a
numpy.ndarrayorpandas.DataFrame.target (str, optional) – The shrinkage target. -
"identity": Shrinks towards a scaled identity matrix. -"constant_correlation": Shrinks towards a constant correlation matrix. Defaults to"identity".
- Returns:
ArrayLike – The shrunk covariance matrix. If R or S_hat were pandas objects, the output will be a
pandas.DataFrame.- Raises:
ValueError – If input dimensions are invalid (e.g., T = 0, or S_hat shape mismatch), or if an unsupported target is specified.
- Parameters:
R (numpy.ndarray | pandas.Series | pandas.DataFrame)
S_hat (numpy.ndarray | pandas.Series | pandas.DataFrame)
target (str)
- Return type:
numpy.ndarray | pandas.Series | pandas.DataFrame
Notes
The function calculates various components of the Ledoit-Wolf formula:
F: The target matrix.
pi_mat, pi_hat, diag_pi, off_pi, rho_hat: Components related to the estimation of the optimal shrinkage intensity.
gamma_hat: The squared Frobenius norm of the difference between the sample covariance and the target matrix.
kappa: Intermediate value for shrinkage intensity.
delta: The optimal shrinkage intensity, clipped between 0 and 1.
The final shrunk covariance matrix is ensured to be positive semi-definite using ensure_psd_matrix.
- pyvallocation.moments.shrink_covariance_oas(R, assume_centered=True)[source]
Return the Oracle Approximating Shrinkage (OAS) covariance estimator.
- Parameters:
R – Scenario matrix with shape
(T, N).assume_centered – If
Truetreat data as centered. Defaults toTrue.
- Returns:
ArrayLike – Shrunk covariance matrix (pandas DataFrame when labels are available).
- Parameters:
R (numpy.ndarray | pandas.Series | pandas.DataFrame)
assume_centered (bool)
- Return type:
numpy.ndarray | pandas.Series | pandas.DataFrame
References
- pyvallocation.moments.shrink_covariance_nls(R_or_S, *, input_is_cov=False, dof_correction=0)[source]
Return Ledoit-Wolf analytical nonlinear shrinkage (QuEST) of covariance.
- Parameters:
R_or_S – Scenario matrix with shape
(T, N)(raw returns).input_is_cov – Reserved for compatibility; must remain
False. Defaults toFalse.dof_correction – Degrees-of-freedom correction applied to the sample covariance.
- Returns:
ArrayLike – Shrunk covariance matrix.
- Parameters:
R_or_S (numpy.ndarray | pandas.Series | pandas.DataFrame)
input_is_cov (bool)
dof_correction (int)
- Return type:
numpy.ndarray | pandas.Series | pandas.DataFrame
References
- pyvallocation.moments.factor_covariance_poet(R, k='auto', thresh='auto', standardize=True, return_decomp=False)[source]
Return POET low-rank plus sparse covariance estimator.
- Parameters:
R – Scenario matrix with shape
(T, N).k – Number of factors or
"auto"to pick via eigen-gap.thresh – Threshold for the sparse residual (
"auto"uses a heuristic).standardize – Whether to standardize returns before decomposition.
return_decomp – If
Truereturn factor loadings and factor scores.
- Returns:
ArrayLike or tuple – Covariance estimate, and optionally factor loadings/scores.
- Parameters:
R (ArrayLike)
k (int | str)
thresh (float | str)
standardize (bool)
return_decomp (bool)
- Return type:
ArrayLike | Tuple[ArrayLike, ArrayLike, ArrayLike]
References
- pyvallocation.moments.robust_covariance_tyler(R, *, shrinkage=0.0, target='identity', tol=1e-06, max_iter=200, ensure_psd=True)[source]
Return regularised Tyler’s M-estimator for heavy-tailed covariance.
- Parameters:
R – Scenario matrix with shape
(T, N).shrinkage – Shrinkage intensity toward
targetin[0, 1].target – Target covariance matrix or
"identity".tol – Relative convergence tolerance for the fixed-point iteration.
max_iter – Maximum number of iterations.
ensure_psd – Whether to project the result to PSD.
- Returns:
ArrayLike – Robust covariance matrix.
- Parameters:
R (ArrayLike)
shrinkage (float)
target (str | np.ndarray | pd.DataFrame)
tol (float)
max_iter (int)
ensure_psd (bool)
- Return type:
ArrayLike
References
- pyvallocation.moments.sparse_precision_glasso(R, *, alpha='auto', assume_centered=True, return_precision=False)[source]
Estimate covariance via sparse inverse covariance (Graphical Lasso).
- Parameters:
R – Scenario matrix with shape
(T, N).alpha – Penalty parameter or
"auto"to cross-validate.assume_centered – If
Falsecenter the data before estimation.return_precision – If
Truealso return the precision matrix.
- Returns:
ArrayLike or tuple – Covariance estimate (and precision if requested).
- Parameters:
R (ArrayLike)
alpha (float | str)
assume_centered (bool)
return_precision (bool)
- Return type:
ArrayLike | Tuple[ArrayLike, ArrayLike]
References
Implements an ADMM-based solver with cross-validated penalty selection.
Bayesian Posterior Adapters
- pyvallocation.moments.posterior_moments_black_litterman(*, prior_cov, prior_mean=None, market_weights=None, risk_aversion=1.0, tau=0.05, mean_views=None, view_confidences=None, omega='idzorek', **kwargs)[source]
Return posterior (mu, Sigma) from
BlackLittermanProcessor.- Parameters:
prior_cov – Prior covariance matrix.
prior_mean – Optional prior mean vector.
market_weights – Optional market-cap weights for implied equilibrium mean.
risk_aversion – Risk-aversion coefficient (defaults to
1.0).tau – Prior covariance shrinkage parameter (defaults to
0.05).mean_views – Mean views (absolute or relative).
view_confidences – Confidence levels for views (Idzorek).
omega – View covariance (
"idzorek"or array-like).**kwargs – Additional arguments forwarded to
BlackLittermanProcessor.
- Returns:
Tuple[ArrayLike, ArrayLike] – Posterior mean and covariance.
- Parameters:
prior_cov (numpy.ndarray | pandas.Series | pandas.DataFrame)
prior_mean (numpy.ndarray | pandas.Series | pandas.DataFrame | None)
market_weights (numpy.ndarray | pandas.Series | pandas.DataFrame | None)
risk_aversion (float)
tau (float)
mean_views (Any | None)
view_confidences (Any | None)
omega (Any)
kwargs (Any)
- Return type:
Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]
References
- pyvallocation.moments.posterior_moments_niw(*, prior_mu, prior_sigma, t0, nu0, sample_mu, sample_sigma, n_obs)[source]
Return NIW posterior classical-equivalent (mu, Sigma).
- Parameters:
prior_mu – Prior mean vector.
prior_sigma – Prior covariance matrix.
t0 – Prior strength (pseudo-observations for mean).
nu0 – Prior degrees of freedom for covariance.
sample_mu – Sample mean vector.
sample_sigma – Sample covariance matrix.
n_obs – Number of observations.
- Returns:
Tuple[ArrayLike, ArrayLike] – Posterior mean and covariance.
- Parameters:
prior_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
prior_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
t0 (int)
nu0 (int)
sample_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
sample_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
n_obs (int)
- Return type:
Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]
- pyvallocation.moments.posterior_moments_niw_with_uncertainty(*, prior_mu, prior_sigma, t0, nu0, sample_mu, sample_sigma, n_obs)[source]
Return NIW posterior moments plus mean-uncertainty covariance.
The returned
S_mucorresponds to the NIW mean uncertainty [Meucci, 2005]:\[S_\mu = \frac{\nu_1}{T_1 (\nu_1 - 2)} \Sigma_1.\]- Parameters:
prior_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
prior_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
t0 (int)
nu0 (int)
sample_mu (numpy.ndarray | pandas.Series | pandas.DataFrame)
sample_sigma (numpy.ndarray | pandas.Series | pandas.DataFrame)
n_obs (int)
- Return type:
Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]
The posterior_moments_niw_with_uncertainty helper returns both classical
posterior moments and the NIW mean-uncertainty covariance \(S_\\mu\), which
is required for robust Bayesian optimisation.
Composite Factory
- pyvallocation.moments.estimate_moments(R, p=None, *, mean_estimator='sample', cov_estimator='sample', mean_kwargs=None, cov_kwargs=None)[source]
Return (mu, Sigma) using configurable mean and covariance estimators.
- Parameters:
R – Scenario matrix with shape
(T, N).p – Optional scenario probabilities aligned with
R.mean_estimator – Mean estimator key (default
"sample").cov_estimator – Covariance estimator key (default
"sample").mean_kwargs – Optional keyword arguments for the mean estimator.
cov_kwargs – Optional keyword arguments for the covariance estimator.
- Returns:
Tuple[ArrayLike, ArrayLike] – Estimated mean and covariance.
- Parameters:
R (numpy.ndarray | pandas.Series | pandas.DataFrame)
p (numpy.ndarray | pandas.Series | pandas.DataFrame | None)
mean_estimator (str)
cov_estimator (str)
mean_kwargs (Dict[str, Any] | None)
cov_kwargs (Dict[str, Any] | None)
- Return type:
Tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, numpy.ndarray | pandas.Series | pandas.DataFrame]