Skip to content

PlotDiagnosticResults

A set of complimentary plotting functions for the diagnostic tests. These can be used outside of the reports if you only want one specific test. Most functions here will store a set of figures in a list for ease of adding to the HTML reports and for saving as local PNG images.

LMM_Diagnostics_Plot(results_df, feature_order='original', max_labels=50, include_delta_r2=True, include_status_summary=True)

Plot LMM diagnostics from Run_LMM_cross_sectional output.

Parameters:

Name Type Description Default
results_df DataFrame

Output dataframe from Run_LMM_cross_sectional.

required
feature_order str

'original' to preserve input order, 'sorted_icc' to sort by ICC.

'original'
max_labels int

Maximum number of x-axis labels to show before thinning them.

50
include_delta_r2 bool

If True, add a delta_R2 plot.

True
include_status_summary bool

If True, add a status/notes summary plot.

True

Returns:

Type Description
list[tuple[str, Figure]]

list[tuple[str, matplotlib.figure.Figure]]: Caption and figure pairs for

list[tuple[str, Figure]]

the generated diagnostic plots.

rep_plot_wrapper(func)

Decorator that
  • optionally forces show=False (if the wrapped function supports it),
  • intercepts and removes wrapper-only kwargs (rep, log_func, caption),
  • logs returned figure(s) into rep via rep.log_plot(fig, caption) if rep provided,
  • closes figures after logging to free memory.

Z_Score_Plot(data, batch, probablity_distribution=False, draw_PDF=True)

Plots the median centered Z-score data as a heatmap and as a histogram of all scores. Re-order by batch for better visualisaion in the heatmap, also plot batch seperators on heatmap. Args: data (np.ndarray): 2D array of Z-scored data (samples x features). Returns: None: Displays plot of Z-scored data and a histogram of the values on different axes.

Cohens_D_plot(cohens_d, pair_labels, df=None, *, rep=None, caption=None, show=False)

Plots Cohen's d effect sizes as a bar plot with histograms of the values. Args: cohens_d (np.ndarray): 2D array of Cohen's d values (num_pairs x num_features). pair_labels (list): List of labels for each pair of batches corresponding to rows in cohens_d. df (Optional[pd.DataFrame], optional): Optional DataFrame containing additional information. Defaults to None. rep (optional): Optional StatsReporter instance. Defaults to None. caption (Optional[str], optional): Optional caption for the plot. Defaults to None. show (bool, optional): Whether to display the plot. Defaults to False. Returns: plt.Figure: The generated plot figure.

Levenes_Test_with_residuals(levene_results_raw, levene_results_resid=None, feature_names=None, *, alpha=0.05, show=False, rep=None)

Plot raw and residualised Levene's test results side-by-side.

Parameters:

Name Type Description Default
levene_results_raw dict

dict of raw Levene outputs.

required
levene_results_resid dict | None

dict of Levene outputs after covariate residualisation.

None
feature_names list | None

optional list of feature names.

None
alpha float

significance threshold.

0.05

Levenes_Test(levene_results, feature_names=None, *, alpha=0.05, show=False, rep=None)

Plot Levene's test results produced by DiagnosticFunctions.Levene_Test.

Parameters:

Name Type Description Default
levene_results dict

dict keyed by comparison tuple (a,b) with values containing at least 'stat' and 'pvalue' arrays (per-feature).

required
feature_names list | None

optional list of feature names (length = n_features).

None
alpha float

significance threshold to highlight features.

0.05
rep

optional report object used by the wrapper.

None

Returns:

Type Description
list[tuple[str, Figure]]

list of (caption, Figure) tuples.

variance_ratio_plot(variance_ratios, pair_labels, df=None, rep=None, show=False, caption=None)

Plots the explained variance ratio for each principal component as a bar plot.

Parameters:

Name Type Description Default
variance_ratios Sequence[float]

A sequence of explained variance ratios for each principal component.

required
pair_labels list

List of labels for each pair of batches corresponding to rows in variance_ratios.

required
df None

Placeholder for potential future use. Defaults to None.

None
rep optional

Optional StatsReporter instance. Defaults to None.

None
caption Optional[str]

Optional caption for the plot. Defaults to None.

None
show bool

Whether to display the plot. Defaults to False.

False

Returns:

None: Displays plot of vario per feature and a histogram of the values on different axes.

Raises: ValueError: If variance_ratios is not a sequence of numbers.

PC_corr_plot(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False)

Generate PCA diagnostic plots and return a list of (caption, fig).

Parameters:

Name Type Description Default
PrincipleComponents ndarray

2D array of shape (n_samples, n_components) containing PCA scores.

required
batch ndarray or list

1D array or list of batch labels corresponding to each sample.

required
covariates optional

Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None.

None
variable_names optional

Optional list of variable names for covariates and batch. If supplied with covariates, the length should match the number of covariate columns. If the first element is 'batch', it is used as the batch column name.

None
PC_correlations optional

Optional output from PC_Correlations used to add correlation summary plots.

False
show bool

Whether to display the figures interactively.

False
cluster_batches bool

Whether to add batch-clustering overlays to the PCA plots.

False

Returns:

Type Description
list[tuple[str, Figure]]

list[tuple[str, plt.Figure]]: Caption and figure pairs for the PCA

list[tuple[str, Figure]]

diagnostic plots.

clustering_analysis_PCA(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False, UMAP_embedding=False, data=None)

Perform clustering analysis on PCA results and generate diagnostic plots. Args: PrincipleComponents (np.ndarray): 2D array of shape (n_samples, n_components) containing PCA scores. batch (np.ndarray or list): 1D array or list of batch labels corresponding to each sample. covariates (optional): Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None. variable_names (optional): Optional list of variable names for covariates and batch. If covariates provided, should match number of covariate columns. If first element is 'batch', it will be used as batch column name. Defaults to None. Returns: List[Tuple[str, plt.Figure]]: A list of tuples containing captions and corresponding figures for the PCA diagnostic plots.

clustering_analysis_all(PrincipleComponents, data, batch, covariates=None, variable_names=None, show=False, UMAP_embedding=True, UMAP_neighbors=15, UMAP_min_dist=0.1, UMAP_metric='euclidean', UMAP_tuning='auto')

Perform clustering diagnostics in PCA and optional UMAP space.

Parameters:

Name Type Description Default
PrincipleComponents ndarray

PCA score matrix with shape (n_samples, n_components).

required
data ndarray

Original feature matrix used for the optional UMAP embedding.

required
batch ndarray or list

Batch labels for each sample.

required
covariates optional

Optional covariate data to color the plots.

None
variable_names optional

Optional names for the covariates and batch variables.

None
show bool

Whether to display the figures interactively.

False
UMAP_embedding bool

Whether to compute and plot a UMAP embedding of the raw data.

True
UMAP_neighbors int

Number of neighbors for UMAP.

15
UMAP_min_dist float

Minimum distance parameter for UMAP.

0.1
UMAP_metric str

Distance metric passed to UMAP.

'euclidean'

Returns:

Type Description
list[tuple[str, Figure]]

list[tuple[str, plt.Figure]]: Caption and figure pairs for the

list[tuple[str, Figure]]

generated PCA and optional UMAP plots.

clustering_analysis_UMAP(data, batch, covariates=None, variable_names=None, rep=None)

Perform UMAP dimensionality reduction and plot the embedding colored by batch and covariates.

Parameters:

Name Type Description Default
data

2D array-like (n_samples x n_features) input data for U

required
batch

1D array-like (n_samples,) batch labels for each sample.

required
covariates

Optional 2D array-like (n_samples x n_covariates

None
variable_names

Optional list of covariate names (if covariates provided).

None
rep

Optional report object with rep.log_plot(plt, caption=...) method for logging

None

plot_eigen_spectra_and_cumulative(score, batch, rep, max_components=50, caption_prefix='PC spectrum')

Compute per-batch variance along PCs (scree / cumulative) and log plots to the report.

Parameters:

Name Type Description Default
score ndarray

(n_samples, n_pcs) PCA score matrix returned by your PCA routine.

required
batch ndarray

(n_samples,) batch labels (numeric or strings).

required
rep

report object that has rep.log_plot(plt, caption=...) and rep.log_text(...)

required
max_components int

maximum number of PCs to visualise (keeps plots cheap).

50
caption_prefix str

prefix for plot captions.

'PC spectrum'

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing per-batch variance curves,

dict[str, Any]

per-batch fraction-of-variance curves, and the number of principal

dict[str, Any]

components used.

plot_covariance_frobenius(data, batch, rep, max_components=50, normalize=True, caption_prefix='Covariance comparison (PC space)')

Compute pairwise Frobenius norms of covariance differences between batches

Parameters:

Name Type Description Default
score

(n_samples, n_pcs) whole data matrix in real space

required
batch ndarray

(n_samples,) batch labels.

required
rep

report object (must support rep.log_plot and rep.log_text).

required
normalize bool

if True, divide pairwise norms by Frobenius norm of pooled covariance.

True
caption_prefix str

prefix for plot captions.

'Covariance comparison (PC space)'

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing the batch covariance matrices,

dict[str, Any]

pairwise Frobenius distances, optional normalized distances, and the

dict[str, Any]

number of principal components used.

mahalanobis_distance_plot(results, rep=None, annotate=True, figsize=(14, 5), cmap='viridis', show=False)

Plot Mahalanobis distances from (...) all on ONE figure: - Heatmap of pairwise RAW distances - Heatmap of pairwise RESIDUAL distances (if available) - Bar chart of centroid-to-global distances (raw vs residual)

Parameters:

Name Type Description Default
results dict

Output from MahalanobisDistance(...)

required
annotate bool

Write numeric values inside heatmap cells/bars.

True
figsize tuple

Matplotlib figure size.

(14, 5)
cmap str

Colormap for heatmaps.

'viridis'
show bool

If True, plt.show(); otherwise just return (fig, axes).

False

Returns:

Type Description
(fig, axes)

The matplotlib Figure and dict of axes.

KS_plot(ks_results, feature_names=None, rep=None, caption=None, show=False)

Plot detailed KS-test results for each comparison.

Parameters:

Name Type Description Default
ks_results dict

Output from KS_Test, keyed by tuples such as (batch, "overall") or (batch1, batch2).

required
feature_names list

Optional feature names for plot labels.

None
rep optional

Report object used to log plots instead of returning them.

None
caption str

Optional caption prefix.

None
show bool

Whether to display the generated figures.

False

Returns:

Name Type Description
Any Any

The report object if rep is provided; otherwise a list of

Any

(caption, figure) tuples.

apply_plot_theme()

Apply the shared visual theme to all subsequent Matplotlib/Seaborn plots.

build_style_registry(subjects, idps, subject_palette='tab10', idp_palette='Set2', subject_markers=None, idp_markers=None)

Build colour and marker registries for subjects and IDPs.

Returns

subject_style, idp_style : dicts mapping each identifier to (color, marker).

plot_RawIDPBoxplotsAcrossSites(df, batch_col='batch', subject_col='subject', idp_cols=None, site_order=None, ncols=2, figsize_per_panel=(6.0, 4.5), show_points=True, point_size=2.2, point_alpha=0.18, point_jitter=0.22, savepath=None, rep=None, show=False, site_threshold_for_horizontal=12, feature_display_limit=10, add_pca_summary=True)

Raw IDP distributions across sites/batches.

Behaviour

  • If the number of sites is small, boxplots are shown vertically.
  • If the number of sites is large, boxplots are shown horizontally.
  • If there are many features, only the top features by site dispersion are shown.
  • Optionally adds a PCA summary panel for the full feature set when truncated.

Notes

The batch/site order is deterministic: - if site_order is provided, it is used as-is - otherwise sites are ordered alphabetically This keeps colors and tick-label order consistent across raw and harmonised runs.

plot_SubjectOrder(df, idp_col='IDP', time_a_col='TimeA', time_b_col='TimeB', rho_col='SpearmanRho', p_col='pValue', times_order=None, significance=0.05, ncols=2, figsize_per_plot=(5, 5), cmap=STYLE.HEATMAP_CMAP, fmt='.1f', center=0, limit_idps=None, sample_method='first', random_state=None, rep=None, show=False, combine_method='stouffer', p_correction='fdr_bh')

Subject order consistency heatmaps with combined p-values.

Per-feature heatmaps show raw permutation p-values. Summary heatmaps show combined p-values corrected by p_correction.

plot_WithinSubjVar(df, subject_col='subject', idp_cols=None, subject_style=None, idp_style=None, limit_subjects=30, limit_idps_for_legend=30, figsize=(16, 13), savepath=None, rep=None, show=False, debug=False)

Plot within-subject variability summary across IDPs.

Panels

A — distribution across subjects for each IDP * few subjects: colored subject markers + subject legend * many subjects: black dots only, no subject legend

B — mean variability per IDP * few IDPs: colored IDP markers + IDP legend * many IDPs: black dots only, no IDP legend

C — top 10 subjects with highest average variability

plot_MultivariateBatchDifference(df, batch_col='batch', value_col='mdval', avg_label='average_batch', figsize=(8, 6), sort_by_value=True, sort_rest_desc=True, value_format='{:.1f}', savepath=None, rep=None, show=False)

Horizontal bar chart of Mahalanobis distances per batch. The average batch is pinned to the top; remaining batches are optionally sorted.

plot_MixedEffectsPart1(df, idp_col='IDP', metrics=None, plot_type='bar', idp_style=None, limit_idps=10, figsize=(4, 6), seed=None, savepath=None, rep=None, show=False, display='subplots', metric=None, show_pairwise_heatmap=True)

Plot ICC, residual batch significance counts, and WCV per IDP.

'subplots' (all metrics in one figure),

'separate' (one figure per metric), 'single' (one figure for metric).

plot_PairwiseSiteDifferencesHeatmap(df, idp_col='IDP', records_col='pairwise_site_tests', sig_key='sig_bonf', title='Pairwise site differences by feature', p_thr=0.05, figsize=(10, 8), top_n=20, max_full_pairs=50, rep=None, show=False)

Discrete binary heatmap: grey = not significant red = significant after Bonferroni correction

plot_MixedEffectsPart2(df, idp_col='IDP', fix_eff=('age', 'sex'), p_thr=0.05, effect_style=None, idp_order=None, marker_size=160, figsize=(9.0, 3.2), cap_width=0.03, linewidth=2.4, xtick_rotation=25, highlight_color='red', savepath=None, rep=None, show=False)

Fixed-effect estimates and 95% confidence intervals per IDP.

Significant estimates (p < p_thr) are filled red; non-significant are open. One covariate per row, compact layout.

plot_AddMultEffects(dfs, feature_col='Feature', p_col='p-value', labels=None, p_thr=0.05, cmap='viridis', annot_fmt='{:.1g}', vmax_logp=10, figsize=(10, 8), show_colorbar=True, savepath=None, value_scale='p', rep=None, show=False, annot_fontsize=STYLE.ANNOT_SIZE, tick_fontsize=STYLE.TICK_SIZE, cbar_shrink=0.2, linewidths=STYLE.HEATMAP_LINEWIDTHS, square=False)

Feature significance heatmap (one or multiple DataFrames).

'p' — raw p-values in [0, 1]

'logp' — -log10(p) clipped at vmax_logp