Portfolio Ensembling

Ensembling utilities in pyvallocation.ensembles blend multiple model outputs into tradeable allocations. They are designed to accept either raw NumPy arrays or PortfolioFrontier instances, so research portfolios can be piped into production workflows with minimal glue code.

Important

The exposure stacking routines adapt the GPL-3 implementation released by the fortitudo.tech project. If you build on top of them, please keep the original attribution in downstream documentation or research notes.

At a glance

pyvallocation.ensembles.average_exposures() averages a stack of sample portfolios (uniformly or using custom weights).
pyvallocation.ensembles.exposure_stacking() solves the quadratic programme popularised by Vorobets [Vorobets, 2024] to concentrate risk on common factors while cancelling idiosyncratic bets. This implementation adapts the GPL-3 licensed routine released in the fortitudo.tech project and is credited accordingly.
pyvallocation.ensembles.assemble_portfolio_ensemble() orchestrates frontier sampling, averaging, stacking, and optional selectors. The function underpins the <no title> notebook.

Quick start

The helpers operate on column-organised samples. Start with simple NumPy arrays to get a feel for the APIs:

import numpy as np
from pyvallocation.ensembles import average_exposures, exposure_stacking

# Two sample portfolios across two assets
samples = np.array([[0.6, 0.3],
                    [0.4, 0.7]])

avg = average_exposures(samples)
stacked = exposure_stacking(samples, L=2)

print(avg)      # -> [0.45 0.55]
print(stacked)  # -> exposure stacking output with damped idiosyncratic bets

Workflow summary

End-to-end ensemble construction typically follows these steps:

Generate candidate portfolios. Optimise frontiers or run bespoke models to obtain a set of column-organised sample weights.
Select representatives. Use methods such as pyvallocation.portfolioapi.PortfolioFrontier.at_risk() or pyvallocation.ensembles.make_portfolio_spec() to standardise inputs.
Blend exposures. Apply average_exposures() for a linear average or exposure_stacking() to damp idiosyncratic bets while keeping the common factor structure.
Report and trade. The outputs are pandas-aware vectors, so they can be fed straight into stress testing, discrete allocation, or attribution.

When you work with pre-built frontiers the API stays consistent and now exposes handy shortcuts:

from pyvallocation.portfolioapi import PortfolioWrapper, AssetsDistribution
from pyvallocation.ensembles import average_frontiers, exposure_stack_frontiers

wrapper = PortfolioWrapper(AssetsDistribution(scenarios=returns))
frontier = wrapper.variance_frontier(num_portfolios=21)
another = wrapper.cvar_frontier(num_portfolios=21)

avg_portfolio = average_frontiers([frontier, another])
stacked_portfolio = exposure_stack_frontiers([frontier, another], L=3)

avg_portfolio.plot.bar(title="Average ensemble weights")

You can also align frontiers by risk percentile before averaging:

from pyvallocation.ensembles import average_frontiers, risk_percentile_selections

selections = risk_percentile_selections([frontier, another], percentile=0.5)
blended = average_frontiers([frontier, another], selections=selections)

Or blend several frontiers/single portfolios directly:

from pyvallocation.ensembles import stack_portfolios
stack_portfolios([frontier, other_frontier], selections=[range(0, 11, 5), [2, 4, 6]])

And build specs straight from an existing PortfolioWrapper:

spec = wrapper.make_ensemble_spec(
    "MV",
    optimiser_kwargs={"num_portfolios": 11, "constraints": {"long_only": True, "total_weight": 1.0}},
    selector="tangency",
)
result = wrapper.assemble_ensembles([spec], ensemble=("average", "stack"))

Tips

The stacking depth L controls how tightly exposures are shrunk. Larger values yield smoother allocations but require more sample portfolios (L cannot exceed the number of samples).
Exposure stacking assumes portfolios are long-only and sum to one. If your research stack permits leverage, normalise samples first.
assemble_portfolio_ensemble() can mix averaging and stacking in a single call. See <no title> for an end-to-end example.
Every helper preserves pandas indices when they are present so the output can flow straight into downstream reporting.
Solver options can be forwarded via solver_options when you need to tweak CVXOPT tolerances or iteration limits.

Troubleshooting

Shape mismatches. Ensure inputs broadcast to (n_assets, n_samples). Use DataFrame.T or np.column_stack to align your sample set.
Missing labels. When averaging/stacking Series with different indices the helpers will reindex and raise on missing entries—double-check asset names.
Solver errors. Exposure stacking relies on CVXOPT. Pass solver_options={'feastol': 1e-7} (or similar) for noisy inputs, and verify that no column contains NaNs or violates the long-only assumption.

Reference

Ensemble utilities for blending portfolio weights.

The helpers in this module implement two complementary recipes that operate on sample portfolios organised column-wise (n_assets x n_samples):

average_exposures() - arithmetic averaging (optionally weighted) across a panel of sample portfolios.
exposure_stacking() - the exposure-stacking quadratic programme first introduced by Vorobets [Vorobets, 2024] and implemented in the GPL-3 licensed fortitudo.tech repository. The routine dampens idiosyncratic exposures while preserving the mean profile of the sample set.

Both routines can operate directly on NumPy/pandas objects and are wrapped by average_frontiers() / exposure_stack_frontiers() to accept PortfolioFrontier instances, ensuring consistent integration with the rest of the library.

All functions preserve pandas indices when they are supplied, so users can move between NumPy and pandas inputs without reshaping or re-labelling portfolios.

Example

>>> from pyvallocation.ensembles import average_frontiers, exposure_stack_frontiers
>>> frontier_a, frontier_b = ...  # PortfolioFrontier instances
>>> average_frontiers([frontier_a, frontier_b])
AAA    0.48
BBB    0.52
Name: Average Ensemble, dtype: float64
>>> exposure_stack_frontiers([frontier_a, frontier_b], L=3)
AAA    0.44
BBB    0.56
Name: Exposure Stacking (L=3), dtype: float64

class pyvallocation.ensembles.EnsembleResult(frontiers, selections, ensembles, metadata)[source]

Bases: object

Container returned by assemble_portfolio_ensemble().

Key fields:: frontiers: Mapping of spec name to frontier object. selections: DataFrame of representative portfolios. ensembles: Mapping of ensemble label to weight Series. metadata: Per-spec metadata dictionaries.

Parameters:

frontiers (Dict[str, 'PortfolioFrontier'])
selections (pd.DataFrame)
ensembles (Dict[str, pd.Series])
metadata (Dict[str, Dict[str, Any]])

property average: pandas.Series | None

Convenience accessor for the average ensemble (if computed).

Returns:: Optional[pd.Series] – Average ensemble weights.

ensembles: Dict[str, pd.Series]

frontiers: Dict[str, 'PortfolioFrontier']

get(name, default=None)[source]

Return the ensemble weights by name (or default if missing).

Parameters:

name – Ensemble key (e.g. "average" or "stack").
default – Value returned when name is not present.

Returns:

Optional[pd.Series] – Requested ensemble weights, if available.

Parameters:

name (str)
default (pandas.Series | None)

Return type:

pandas.Series | None

metadata: Dict[str, Dict[str, Any]]

selections: pd.DataFrame

property stacked: pandas.Series | None

Convenience accessor for the stacked ensemble (if computed).

Returns:: Optional[pd.Series] – Stacked ensemble weights.

class pyvallocation.ensembles.EnsembleSpec(name, frontier_factory, selector, metadata=<factory>, frontier_selection=None)[source]

Bases: object

Descriptor for a single portfolio specification participating in an ensemble.

Key fields:: name: Spec identifier. frontier_factory: Callable returning a PortfolioFrontier. selector: Callable extracting a representative portfolio. metadata: Optional metadata attached to the resulting ensemble output. frontier_selection: Optional subset of frontier columns for full-frontier blends.

Parameters:

name (str)
frontier_factory (Callable[[], 'PortfolioFrontier'])
selector (Callable[['PortfolioFrontier'], Union[pd.Series, Tuple[Any, ...], np.ndarray]])
metadata (Dict[str, Any])
frontier_selection (Optional[Sequence[int]])

frontier_factory: Callable[[], 'PortfolioFrontier']

frontier_selection: Sequence[int] | None = None

metadata: Dict[str, Any]

name: str

selector: Callable[['PortfolioFrontier'], pd.Series | Tuple[Any, ...] | np.ndarray]

pyvallocation.ensembles.assemble_portfolio_ensemble(specs, *, ensemble='stack', combine='selected', stack_folds=None, ensemble_weights=None, stack_kwargs=None)[source]

Build multiple frontiers and collapse them into ensemble portfolios with a single call.

Parameters:

specs – Sequence of EnsembleSpec instances describing how to generate and summarise each frontier.
ensemble – "average", "stack", a sequence of the two, or None. Defaults to "stack" for a stacked blend. Use ("average", "stack") to obtain both.
combine – "selected" (default) averages/stacks the representative portfolios extracted via each spec’s selector. "frontier" operates directly on the underlying frontiers using average_frontiers() and exposure_stack_frontiers().
stack_folds – Number of folds for stacking. When omitted the helper picks min(3, number_of_portfolios).
ensemble_weights – Optional weights applied during averaging (either over selected portfolios or the full frontier combination).
stack_kwargs – Optional dictionary forwarded to the stacking solver (solver_options argument).

Returns:

EnsembleResult – Rich result object containing the generated frontiers, representative portfolios, and any requested ensemble allocations.

Parameters:

specs (Sequence[EnsembleSpec])
ensemble (str | Sequence[str] | None)
combine (str)
stack_folds (int | None)
ensemble_weights (Sequence[float] | None)
stack_kwargs (dict | None)

Return type:

EnsembleResult

pyvallocation.ensembles.average_exposures(sample_portfolios, weights=None)[source]

Compute the (possibly weighted) average exposure across multiple portfolios.

The routine accepts any collection of sample weights arranged column-wise. When weights is omitted the average is uniform; otherwise weights must supply one non-negative scalar per sample and is normalised to unity.

Parameters:

sample_portfolios – Array-like object whose columns represent sample portfolios.
weights – Optional sequence or pandas Series of length n_samples providing relative importance for each column. When a Series is supplied its index is aligned to the sample column labels. The entries are automatically rescaled so that they sum to one.

Returns:

np.ndarray or pd.Series – Averaged exposure vector with length n_assets. A pandas Series is returned when asset names are available on the input.

Parameters:

sample_portfolios (numpy.ndarray | pandas.DataFrame | pandas.Series)
weights (Sequence[float] | pandas.Series | None)

Return type:

numpy.ndarray | pandas.DataFrame | pandas.Series

Examples

>>> import numpy as np
>>> samples = np.array([[0.6, 0.3], [0.4, 0.7]])
>>> average_exposures(samples)
array([0.45, 0.55])
>>> average_exposures(samples, weights=[1.0, 3.0])
array([0.375, 0.625])

pyvallocation.ensembles.average_frontiers(frontiers, selections=None, *, ensemble_weights=None)[source]

Average one portfolio from each frontier (aligned risk level).

Parameters:

frontiers – Sequence of frontier-like objects (typically PortfolioFrontier instances).
selections – Optional per-frontier iterable selecting a single column index. When omitted, each frontier contributes its minimum-risk portfolio to avoid mixing risk levels.
ensemble_weights – Optional weights applied to the stacked sample matrix before averaging. Must have length equal to the total number of selected portfolios.

Returns:

pd.Series – Averaged exposure vector with propagated asset labels when available.

Parameters:

frontiers (Sequence[object])
selections (Sequence[Iterable[int] | None] | None)
ensemble_weights (Sequence[float] | None)

Return type:

pandas.Series

pyvallocation.ensembles.exposure_stack_frontiers(frontiers, L, selections=None, *, solver_options=None)[source]

Apply exposure stacking across one or more frontiers.

Parameters:

frontiers – Sequence of frontier-like objects contributing sample portfolios.
L – Number of stacking folds (as in exposure_stacking()).
selections – Optional iterable specifying one column index per frontier. When omitted, each frontier contributes its minimum-risk portfolio.
solver_options – Optional dictionary of CVXOPT solver overrides.

Returns:

pd.Series – Exposure-stacked weights with propagated asset labels.

Parameters:

frontiers (Sequence[object])
L (int)
selections (Sequence[Iterable[int] | None] | None)
solver_options (dict | None)

Return type:

pandas.Series

Notes

The total number of selected portfolios must be at least L. When selections is omitted the full frontier matrices are used, matching the layout of weights.

pyvallocation.ensembles.exposure_stacking(sample_portfolios, L, *, solver_options=None)[source]

Compute exposure stacking weights following Vorobets [Vorobets, 2024].

The algorithm partitions the set of sample portfolios into L buckets and solves a quadratic programme that minimises the sum of cross-validated residuals. Intuitively, the resulting allocation penalises weights that are idiosyncratic to any particular subset of samples, favouring stable signals.

Parameters:

sample_portfolios – Panel of sample portfolios organised column-wise.
L – Number of cross-validation folds. Must satisfy 1 <= L <= n_samples.
solver_options – Optional dictionary of CVXOPT solver overrides (e.g., {'maxiters': 100}).

Returns:

np.ndarray or pd.Series – Exposure-stacked portfolio of length n_assets. A Series is returned when asset names are provided on the input.

Parameters:

sample_portfolios (numpy.ndarray | pandas.DataFrame | pandas.Series)
L (int)
solver_options (dict | None)

Return type:

numpy.ndarray | pandas.DataFrame | pandas.Series

Notes

This implementation adapts the open-source reference code from fortitudo.tech (GPL-3.0) that accompanies Vorobets’ original publication.

Raises:

RuntimeError – If the underlying quadratic programme does not terminate
with status 'optimal'. –

Parameters:

sample_portfolios (numpy.ndarray | pandas.DataFrame | pandas.Series)
L (int)
solver_options (dict | None)

Return type:

numpy.ndarray | pandas.DataFrame | pandas.Series

pyvallocation.ensembles.make_portfolio_spec(name, *, returns=None, probabilities=None, preprocess=None, projection=None, distribution=None, distribution_factory=None, use_scenarios=False, mean_estimator='sample', cov_estimator='sample', mean_kwargs=None, cov_kwargs=None, optimiser='mean_variance', optimiser_kwargs=None, selector='tangency', selector_kwargs=None, frontier_selection=None, metadata=None)[source]

Convenience constructor for EnsembleSpec covering common workflows.

Parameters:

name – Spec identifier.
returns – Historical scenario matrix (rows = scenarios, columns = assets).
probabilities – Optional scenario weights aligned with returns.
preprocess – Optional callable applied to returns before estimation (e.g., convert compounded to simple returns).
projection – Optional dictionary with projection settings. Recognised keys: annualization_factor (for project_mean_covariance()), log_to_simple/to_simple (apply log2simple()), and transform (callable transform(mu, Sigma) -> (mu, Sigma)).
distribution / distribution_factory – Supply an AssetsDistribution directly (or a factory returning one) instead of estimating from data.
use_scenarios – When True the distribution is built from scenarios rather than estimated moments.
mean_estimator / cov_estimator – Names understood by pyvallocation.moments.estimate_moments().
mean_kwargs / cov_kwargs – Additional keyword arguments forwarded to the estimators.
optimiser – Optimiser key ("mean_variance", "cvar", "rrp", "robust") or a callable building a PortfolioFrontier from a PortfolioWrapper.
optimiser_kwargs – Keyword arguments for the optimiser. If it contains constraints (a dict or Constraints instance) they are applied to the wrapper before building the frontier.
selector – How to extract the representative portfolio. Accepts strings ("tangency", "min_risk", "max_return", "risk_target", "risk_match", "risk_percentile", "column") or a callable.
selector_kwargs – Extra parameters for the selector (e.g., risk_free_rate for tangency).
frontier_selection – Column subset used when combining over entire frontiers.
metadata – Optional dictionary persisted in the returned EnsembleResult.

Returns:

EnsembleSpec – Spec object that encapsulates the distribution, optimiser, selector, and associated metadata.

Parameters:

name (str)
returns (Optional[pd.DataFrame])
probabilities (Optional[Union[pd.Series, Sequence[float], np.ndarray]])
preprocess (Optional[Callable[[pd.DataFrame], pd.DataFrame]])
projection (Optional[Dict[str, Any]])
distribution (Optional['AssetsDistribution'])
distribution_factory (Optional[Callable[[], 'AssetsDistribution']])
use_scenarios (bool)
mean_estimator (str)
cov_estimator (str)
mean_kwargs (Optional[Dict[str, Any]])
cov_kwargs (Optional[Dict[str, Any]])
optimiser (Union[str, Callable[['PortfolioWrapper'], 'PortfolioFrontier'], Callable[..., 'PortfolioFrontier']])
optimiser_kwargs (Optional[Dict[str, Any]])
selector (Union[str, Callable[['PortfolioFrontier'], Union[pd.Series, Tuple[Any, ...], np.ndarray]]])
selector_kwargs (Optional[Dict[str, Any]])
frontier_selection (Optional[Sequence[int]])
metadata (Optional[Dict[str, Any]])

Return type:

EnsembleSpec

pyvallocation.ensembles.risk_percentile_selections(frontiers, percentile, *, risk_label=None)[source]

Return per-frontier column selections aligned by risk percentile.

Parameters:

frontiers – Sequence of frontier objects.
percentile – Percentile on [0, 1] or [0, 100].
risk_label – Optional risk label to align on.

Returns:

list[list[int]] – Column selections per frontier.

Parameters:

frontiers (Sequence['PortfolioFrontier'])
percentile (float)
risk_label (Optional[str])

Return type:

List[List[int]]

pyvallocation.ensembles.stack_portfolios(portfolios, *, selections=None, L=3, solver_options=None)[source]

Stack a mixture of individual portfolios and/or frontiers.

Each entry can be a pandas Series/NumPy vector (single portfolio) or a PortfolioFrontier (optionally paired with a selection of frontier columns via selections).

Parameters:

portfolios (Sequence[Any])
selections (Sequence[Iterable[int] | None] | None)
L (int)
solver_options (dict | None)

Return type:

pandas.Series