Portfolio Ensembling
Ensembling utilities in pyvallocation.ensembles blend multiple model
outputs into tradeable allocations. They are designed to accept either raw
NumPy arrays or PortfolioFrontier
instances, so research portfolios can be piped into production workflows with
minimal glue code.
Important
The exposure stacking routines adapt the GPL-3 implementation released by the fortitudo.tech project. If you build on top of them, please keep the original attribution in downstream documentation or research notes.
At a glance
pyvallocation.ensembles.average_exposures()averages a stack of sample portfolios (uniformly or using custom weights).pyvallocation.ensembles.exposure_stacking()solves the quadratic programme popularised by Vorobets [Vorobets, 2024] to concentrate risk on common factors while cancelling idiosyncratic bets. This implementation adapts the GPL-3 licensed routine released in the fortitudo.tech project and is credited accordingly.pyvallocation.ensembles.assemble_portfolio_ensemble()orchestrates frontier sampling, averaging, stacking, and optional selectors. The function underpins the <no title> notebook.
Quick start
The helpers operate on column-organised samples. Start with simple NumPy arrays to get a feel for the APIs:
import numpy as np
from pyvallocation.ensembles import average_exposures, exposure_stacking
# Two sample portfolios across two assets
samples = np.array([[0.6, 0.3],
[0.4, 0.7]])
avg = average_exposures(samples)
stacked = exposure_stacking(samples, L=2)
print(avg) # -> [0.45 0.55]
print(stacked) # -> exposure stacking output with damped idiosyncratic bets
Workflow summary
End-to-end ensemble construction typically follows these steps:
Generate candidate portfolios. Optimise frontiers or run bespoke models to obtain a set of column-organised sample weights.
Select representatives. Use methods such as
pyvallocation.portfolioapi.PortfolioFrontier.at_risk()orpyvallocation.ensembles.make_portfolio_spec()to standardise inputs.Blend exposures. Apply
average_exposures()for a linear average orexposure_stacking()to damp idiosyncratic bets while keeping the common factor structure.Report and trade. The outputs are pandas-aware vectors, so they can be fed straight into stress testing, discrete allocation, or attribution.
When you work with pre-built frontiers the API stays consistent and now exposes handy shortcuts:
from pyvallocation.portfolioapi import PortfolioWrapper, AssetsDistribution
from pyvallocation.ensembles import average_frontiers, exposure_stack_frontiers
wrapper = PortfolioWrapper(AssetsDistribution(scenarios=returns))
frontier = wrapper.variance_frontier(num_portfolios=21)
another = wrapper.cvar_frontier(num_portfolios=21)
avg_portfolio = average_frontiers([frontier, another])
stacked_portfolio = exposure_stack_frontiers([frontier, another], L=3)
avg_portfolio.plot.bar(title="Average ensemble weights")
You can also align frontiers by risk percentile before averaging:
from pyvallocation.ensembles import average_frontiers, risk_percentile_selections
selections = risk_percentile_selections([frontier, another], percentile=0.5)
blended = average_frontiers([frontier, another], selections=selections)
Or blend several frontiers/single portfolios directly:
from pyvallocation.ensembles import stack_portfolios
stack_portfolios([frontier, other_frontier], selections=[range(0, 11, 5), [2, 4, 6]])
And build specs straight from an existing PortfolioWrapper:
spec = wrapper.make_ensemble_spec(
"MV",
optimiser_kwargs={"num_portfolios": 11, "constraints": {"long_only": True, "total_weight": 1.0}},
selector="tangency",
)
result = wrapper.assemble_ensembles([spec], ensemble=("average", "stack"))
Tips
The stacking depth
Lcontrols how tightly exposures are shrunk. Larger values yield smoother allocations but require more sample portfolios (Lcannot exceed the number of samples).Exposure stacking assumes portfolios are long-only and sum to one. If your research stack permits leverage, normalise samples first.
assemble_portfolio_ensemble()can mix averaging and stacking in a single call. See <no title> for an end-to-end example.Every helper preserves pandas indices when they are present so the output can flow straight into downstream reporting.
Solver options can be forwarded via
solver_optionswhen you need to tweak CVXOPT tolerances or iteration limits.
Troubleshooting
Shape mismatches. Ensure inputs broadcast to
(n_assets, n_samples). UseDataFrame.Tornp.column_stackto align your sample set.Missing labels. When averaging/stacking Series with different indices the helpers will reindex and raise on missing entries—double-check asset names.
Solver errors. Exposure stacking relies on CVXOPT. Pass
solver_options={'feastol': 1e-7}(or similar) for noisy inputs, and verify that no column contains NaNs or violates the long-only assumption.
Reference
Ensemble utilities for blending portfolio weights.
The helpers in this module implement two complementary recipes that operate on
sample portfolios organised column-wise (n_assets x n_samples):
average_exposures()- arithmetic averaging (optionally weighted) across a panel of sample portfolios.exposure_stacking()- the exposure-stacking quadratic programme first introduced by Vorobets [Vorobets, 2024] and implemented in the GPL-3 licensed fortitudo.tech repository. The routine dampens idiosyncratic exposures while preserving the mean profile of the sample set.
Both routines can operate directly on NumPy/pandas objects and are wrapped by
average_frontiers() / exposure_stack_frontiers() to accept
PortfolioFrontier instances, ensuring
consistent integration with the rest of the library.
All functions preserve pandas indices when they are supplied, so users can move between NumPy and pandas inputs without reshaping or re-labelling portfolios.
Example
>>> from pyvallocation.ensembles import average_frontiers, exposure_stack_frontiers
>>> frontier_a, frontier_b = ... # PortfolioFrontier instances
>>> average_frontiers([frontier_a, frontier_b])
AAA 0.48
BBB 0.52
Name: Average Ensemble, dtype: float64
>>> exposure_stack_frontiers([frontier_a, frontier_b], L=3)
AAA 0.44
BBB 0.56
Name: Exposure Stacking (L=3), dtype: float64
- class pyvallocation.ensembles.EnsembleResult(frontiers, selections, ensembles, metadata)[source]
Bases:
objectContainer returned by
assemble_portfolio_ensemble().- Key fields:
frontiers: Mapping of spec name to frontier object. selections: DataFrame of representative portfolios. ensembles: Mapping of ensemble label to weight Series. metadata: Per-spec metadata dictionaries.
- Parameters:
frontiers (Dict[str, 'PortfolioFrontier'])
selections (pd.DataFrame)
ensembles (Dict[str, pd.Series])
metadata (Dict[str, Dict[str, Any]])
- property average: pandas.Series | None
Convenience accessor for the average ensemble (if computed).
- Returns:
Optional[pd.Series] – Average ensemble weights.
- ensembles: Dict[str, pd.Series]
- frontiers: Dict[str, 'PortfolioFrontier']
- get(name, default=None)[source]
Return the ensemble weights by name (or
defaultif missing).- Parameters:
name – Ensemble key (e.g.
"average"or"stack").default – Value returned when
nameis not present.
- Returns:
Optional[pd.Series] – Requested ensemble weights, if available.
- Parameters:
name (str)
default (pandas.Series | None)
- Return type:
pandas.Series | None
- metadata: Dict[str, Dict[str, Any]]
- selections: pd.DataFrame
- property stacked: pandas.Series | None
Convenience accessor for the stacked ensemble (if computed).
- Returns:
Optional[pd.Series] – Stacked ensemble weights.
- class pyvallocation.ensembles.EnsembleSpec(name, frontier_factory, selector, metadata=<factory>, frontier_selection=None)[source]
Bases:
objectDescriptor for a single portfolio specification participating in an ensemble.
- Key fields:
name: Spec identifier. frontier_factory: Callable returning a
PortfolioFrontier. selector: Callable extracting a representative portfolio. metadata: Optional metadata attached to the resulting ensemble output. frontier_selection: Optional subset of frontier columns for full-frontier blends.
- Parameters:
name (str)
frontier_factory (Callable[[], 'PortfolioFrontier'])
selector (Callable[['PortfolioFrontier'], Union[pd.Series, Tuple[Any, ...], np.ndarray]])
metadata (Dict[str, Any])
frontier_selection (Optional[Sequence[int]])
- frontier_factory: Callable[[], 'PortfolioFrontier']
- frontier_selection: Sequence[int] | None = None
- metadata: Dict[str, Any]
- name: str
- selector: Callable[['PortfolioFrontier'], pd.Series | Tuple[Any, ...] | np.ndarray]
- pyvallocation.ensembles.assemble_portfolio_ensemble(specs, *, ensemble='stack', combine='selected', stack_folds=None, ensemble_weights=None, stack_kwargs=None)[source]
Build multiple frontiers and collapse them into ensemble portfolios with a single call.
- Parameters:
specs – Sequence of
EnsembleSpecinstances describing how to generate and summarise each frontier.ensemble –
"average","stack", a sequence of the two, orNone. Defaults to"stack"for a stacked blend. Use("average", "stack")to obtain both.combine –
"selected"(default) averages/stacks the representative portfolios extracted via each spec’s selector."frontier"operates directly on the underlying frontiers usingaverage_frontiers()andexposure_stack_frontiers().stack_folds – Number of folds for stacking. When omitted the helper picks
min(3, number_of_portfolios).ensemble_weights – Optional weights applied during averaging (either over selected portfolios or the full frontier combination).
stack_kwargs – Optional dictionary forwarded to the stacking solver (
solver_optionsargument).
- Returns:
EnsembleResult – Rich result object containing the generated frontiers, representative portfolios, and any requested ensemble allocations.
- Parameters:
specs (Sequence[EnsembleSpec])
ensemble (str | Sequence[str] | None)
combine (str)
stack_folds (int | None)
ensemble_weights (Sequence[float] | None)
stack_kwargs (dict | None)
- Return type:
- pyvallocation.ensembles.average_exposures(sample_portfolios, weights=None)[source]
Compute the (possibly weighted) average exposure across multiple portfolios.
The routine accepts any collection of sample weights arranged column-wise. When
weightsis omitted the average is uniform; otherwiseweightsmust supply one non-negative scalar per sample and is normalised to unity.- Parameters:
sample_portfolios – Array-like object whose columns represent sample portfolios.
weights – Optional sequence or pandas Series of length
n_samplesproviding relative importance for each column. When a Series is supplied its index is aligned to the sample column labels. The entries are automatically rescaled so that they sum to one.
- Returns:
np.ndarray or pd.Series – Averaged exposure vector with length
n_assets. A pandas Series is returned when asset names are available on the input.- Parameters:
sample_portfolios (numpy.ndarray | pandas.DataFrame | pandas.Series)
weights (Sequence[float] | pandas.Series | None)
- Return type:
numpy.ndarray | pandas.DataFrame | pandas.Series
Examples
>>> import numpy as np >>> samples = np.array([[0.6, 0.3], [0.4, 0.7]]) >>> average_exposures(samples) array([0.45, 0.55]) >>> average_exposures(samples, weights=[1.0, 3.0]) array([0.375, 0.625])
- pyvallocation.ensembles.average_frontiers(frontiers, selections=None, *, ensemble_weights=None)[source]
Average one portfolio from each frontier (aligned risk level).
- Parameters:
frontiers – Sequence of frontier-like objects (typically
PortfolioFrontierinstances).selections – Optional per-frontier iterable selecting a single column index. When omitted, each frontier contributes its minimum-risk portfolio to avoid mixing risk levels.
ensemble_weights – Optional weights applied to the stacked sample matrix before averaging. Must have length equal to the total number of selected portfolios.
- Returns:
pd.Series – Averaged exposure vector with propagated asset labels when available.
- Parameters:
frontiers (Sequence[object])
selections (Sequence[Iterable[int] | None] | None)
ensemble_weights (Sequence[float] | None)
- Return type:
pandas.Series
- pyvallocation.ensembles.exposure_stack_frontiers(frontiers, L, selections=None, *, solver_options=None)[source]
Apply exposure stacking across one or more frontiers.
- Parameters:
frontiers – Sequence of frontier-like objects contributing sample portfolios.
L – Number of stacking folds (as in
exposure_stacking()).selections – Optional iterable specifying one column index per frontier. When omitted, each frontier contributes its minimum-risk portfolio.
solver_options – Optional dictionary of CVXOPT solver overrides.
- Returns:
pd.Series – Exposure-stacked weights with propagated asset labels.
- Parameters:
frontiers (Sequence[object])
L (int)
selections (Sequence[Iterable[int] | None] | None)
solver_options (dict | None)
- Return type:
pandas.Series
Notes
The total number of selected portfolios must be at least
L. Whenselectionsis omitted the full frontier matrices are used, matching the layout ofweights.
- pyvallocation.ensembles.exposure_stacking(sample_portfolios, L, *, solver_options=None)[source]
Compute exposure stacking weights following Vorobets [Vorobets, 2024].
The algorithm partitions the set of sample portfolios into
Lbuckets and solves a quadratic programme that minimises the sum of cross-validated residuals. Intuitively, the resulting allocation penalises weights that are idiosyncratic to any particular subset of samples, favouring stable signals.- Parameters:
sample_portfolios – Panel of sample portfolios organised column-wise.
L – Number of cross-validation folds. Must satisfy
1 <= L <= n_samples.solver_options – Optional dictionary of CVXOPT solver overrides (e.g.,
{'maxiters': 100}).
- Returns:
np.ndarray or pd.Series – Exposure-stacked portfolio of length
n_assets. A Series is returned when asset names are provided on the input.- Parameters:
sample_portfolios (numpy.ndarray | pandas.DataFrame | pandas.Series)
L (int)
solver_options (dict | None)
- Return type:
numpy.ndarray | pandas.DataFrame | pandas.Series
Notes
This implementation adapts the open-source reference code from fortitudo.tech (GPL-3.0) that accompanies Vorobets’ original publication.
- Raises:
RuntimeError – If the underlying quadratic programme does not terminate
with status 'optimal'. –
- Parameters:
sample_portfolios (numpy.ndarray | pandas.DataFrame | pandas.Series)
L (int)
solver_options (dict | None)
- Return type:
numpy.ndarray | pandas.DataFrame | pandas.Series
- pyvallocation.ensembles.make_portfolio_spec(name, *, returns=None, probabilities=None, preprocess=None, projection=None, distribution=None, distribution_factory=None, use_scenarios=False, mean_estimator='sample', cov_estimator='sample', mean_kwargs=None, cov_kwargs=None, optimiser='mean_variance', optimiser_kwargs=None, selector='tangency', selector_kwargs=None, frontier_selection=None, metadata=None)[source]
Convenience constructor for
EnsembleSpeccovering common workflows.- Parameters:
name – Spec identifier.
returns – Historical scenario matrix (rows = scenarios, columns = assets).
probabilities – Optional scenario weights aligned with
returns.preprocess – Optional callable applied to
returnsbefore estimation (e.g., convert compounded to simple returns).projection – Optional dictionary with projection settings. Recognised keys:
annualization_factor(forproject_mean_covariance()),log_to_simple/to_simple(applylog2simple()), andtransform(callabletransform(mu, Sigma) -> (mu, Sigma)).distribution / distribution_factory – Supply an
AssetsDistributiondirectly (or a factory returning one) instead of estimating from data.use_scenarios – When
Truethe distribution is built from scenarios rather than estimated moments.mean_estimator / cov_estimator – Names understood by
pyvallocation.moments.estimate_moments().mean_kwargs / cov_kwargs – Additional keyword arguments forwarded to the estimators.
optimiser – Optimiser key (
"mean_variance","cvar","rrp","robust") or a callable building aPortfolioFrontierfrom aPortfolioWrapper.optimiser_kwargs – Keyword arguments for the optimiser. If it contains
constraints(a dict orConstraintsinstance) they are applied to the wrapper before building the frontier.selector – How to extract the representative portfolio. Accepts strings (
"tangency","min_risk","max_return","risk_target","risk_match","risk_percentile","column") or a callable.selector_kwargs – Extra parameters for the selector (e.g.,
risk_free_ratefor tangency).frontier_selection – Column subset used when combining over entire frontiers.
metadata – Optional dictionary persisted in the returned
EnsembleResult.
- Returns:
EnsembleSpec – Spec object that encapsulates the distribution, optimiser, selector, and associated metadata.
- Parameters:
name (str)
returns (Optional[pd.DataFrame])
probabilities (Optional[Union[pd.Series, Sequence[float], np.ndarray]])
preprocess (Optional[Callable[[pd.DataFrame], pd.DataFrame]])
projection (Optional[Dict[str, Any]])
distribution (Optional['AssetsDistribution'])
distribution_factory (Optional[Callable[[], 'AssetsDistribution']])
use_scenarios (bool)
mean_estimator (str)
cov_estimator (str)
mean_kwargs (Optional[Dict[str, Any]])
cov_kwargs (Optional[Dict[str, Any]])
optimiser (Union[str, Callable[['PortfolioWrapper'], 'PortfolioFrontier'], Callable[..., 'PortfolioFrontier']])
optimiser_kwargs (Optional[Dict[str, Any]])
selector (Union[str, Callable[['PortfolioFrontier'], Union[pd.Series, Tuple[Any, ...], np.ndarray]]])
selector_kwargs (Optional[Dict[str, Any]])
frontier_selection (Optional[Sequence[int]])
metadata (Optional[Dict[str, Any]])
- Return type:
- pyvallocation.ensembles.risk_percentile_selections(frontiers, percentile, *, risk_label=None)[source]
Return per-frontier column selections aligned by risk percentile.
- Parameters:
frontiers – Sequence of frontier objects.
percentile – Percentile on
[0, 1]or[0, 100].risk_label – Optional risk label to align on.
- Returns:
list[list[int]] – Column selections per frontier.
- Parameters:
frontiers (Sequence['PortfolioFrontier'])
percentile (float)
risk_label (Optional[str])
- Return type:
List[List[int]]
- pyvallocation.ensembles.stack_portfolios(portfolios, *, selections=None, L=3, solver_options=None)[source]
Stack a mixture of individual portfolios and/or frontiers.
Each entry can be a pandas Series/NumPy vector (single portfolio) or a
PortfolioFrontier(optionally paired with a selection of frontier columns viaselections).- Parameters:
portfolios (Sequence[Any])
selections (Sequence[Iterable[int] | None] | None)
L (int)
solver_options (dict | None)
- Return type:
pandas.Series