Skip to content

PlotDiagnosticResults

A set of complimentary plotting functions for the diagnostic tests. These can be used outside of the reports if you only want one specific test. Most functions here will store a set of figures in a list for ease of adding to the HTML reports and for saving as local PNG images.


LMM_Diagnostics_Plot(results_df, feature_order='original', max_labels=50, include_delta_r2=True, include_status_summary=True)

Plot LMM diagnostics from Run_LMM_cross_sectional output.

Parameters:

Name Type Description Default
results_df DataFrame

Output dataframe from Run_LMM_cross_sectional.

required
feature_order str

'original' to preserve input order, 'sorted_icc' to sort by ICC.

'original'
max_labels int

Maximum number of x-axis labels to show before thinning them.

50
include_delta_r2 bool

If True, add a delta_R2 plot.

True
include_status_summary bool

If True, add a status/notes summary plot.

True

Returns:

Type Description
list[tuple[str, Figure]]

list[tuple[str, matplotlib.figure.Figure]]: Caption and figure pairs for

list[tuple[str, Figure]]

the generated diagnostic plots.

rep_plot_wrapper(func)

Decorator that
  • optionally forces show=False (if the wrapped function supports it),
  • intercepts and removes wrapper-only kwargs (rep, log_func, caption),
  • logs returned figure(s) into rep via rep.log_plot(fig, caption) if rep provided,
  • closes figures after logging to free memory.

Z_Score_Plot(data, batch, probablity_distribution=False, draw_PDF=True)

Plots the median centered Z-score data as a heatmap and as a histogram of all scores. Re-order by batch for better visualisaion in the heatmap, also plot batch seperators on heatmap. Args: data (np.ndarray): 2D array of Z-scored data (samples x features). Returns: None: Displays plot of Z-scored data and a histogram of the values on different axes.

Cohens_D_plot(cohens_d, pair_labels, df=None, *, rep=None, caption=None, show=False)

Plots Cohen's d effect sizes as a bar plot with histograms of the values. Args: cohens_d (np.ndarray): 2D array of Cohen's d values (num_pairs x num_features). pair_labels (list): List of labels for each pair of batches corresponding to rows in cohens_d. df (Optional[pd.DataFrame], optional): Optional DataFrame containing additional information. Defaults to None. rep (optional): Optional StatsReporter instance. Defaults to None. caption (Optional[str], optional): Optional caption for the plot. Defaults to None. show (bool, optional): Whether to display the plot. Defaults to False. Returns: plt.Figure: The generated plot figure.

variance_ratio_plot(variance_ratios, pair_labels, df=None, rep=None, show=False, caption=None)

Plots the explained variance ratio for each principal component as a bar plot.

Parameters:

Name Type Description Default
variance_ratios Sequence[float]

A sequence of explained variance ratios for each principal component.

required
pair_labels list

List of labels for each pair of batches corresponding to rows in variance_ratios.

required
df None

Placeholder for potential future use. Defaults to None.

None
rep optional

Optional StatsReporter instance. Defaults to None.

None
caption Optional[str]

Optional caption for the plot. Defaults to None.

None
show bool

Whether to display the plot. Defaults to False.

False

Returns:

None: Displays plot of vario per feature and a histogram of the values on different axes.

Raises: ValueError: If variance_ratios is not a sequence of numbers.

PC_corr_plot(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False)

Generate PCA diagnostic plots and return a list of (caption, fig).

Parameters:

Name Type Description Default
PrincipleComponents ndarray

2D array of shape (n_samples, n_components) containing PCA scores.

required
batch ndarray or list

1D array or list of batch labels corresponding to each sample.

required
covariates optional

Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None.

None
variable_names optional

Optional list of variable names for covariates and batch. If supplied with covariates, the length should match the number of covariate columns. If the first element is 'batch', it is used as the batch column name.

None
PC_correlations optional

Optional output from PC_Correlations used to add correlation summary plots.

False
show bool

Whether to display the figures interactively.

False
cluster_batches bool

Whether to add batch-clustering overlays to the PCA plots.

False

Returns:

Type Description
list[tuple[str, Figure]]

list[tuple[str, plt.Figure]]: Caption and figure pairs for the PCA

list[tuple[str, Figure]]

diagnostic plots.

clustering_analysis_PCA(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False, UMAP_embedding=False, data=None)

Perform clustering analysis on PCA results and generate diagnostic plots. Args: PrincipleComponents (np.ndarray): 2D array of shape (n_samples, n_components) containing PCA scores. batch (np.ndarray or list): 1D array or list of batch labels corresponding to each sample. covariates (optional): Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None. variable_names (optional): Optional list of variable names for covariates and batch. If covariates provided, should match number of covariate columns. If first element is 'batch', it will be used as batch column name. Defaults to None. Returns: List[Tuple[str, plt.Figure]]: A list of tuples containing captions and corresponding figures for the PCA diagnostic plots.

clustering_analysis_all(PrincipleComponents, data, batch, covariates=None, variable_names=None, show=False, UMAP_embedding=True, UMAP_neighbors=15, UMAP_min_dist=0.1, UMAP_metric='euclidean', UMAP_tuning='auto')

Perform clustering diagnostics in PCA and optional UMAP space.

Parameters:

Name Type Description Default
PrincipleComponents ndarray

PCA score matrix with shape (n_samples, n_components).

required
data ndarray

Original feature matrix used for the optional UMAP embedding.

required
batch ndarray or list

Batch labels for each sample.

required
covariates optional

Optional covariate data to color the plots.

None
variable_names optional

Optional names for the covariates and batch variables.

None
show bool

Whether to display the figures interactively.

False
UMAP_embedding bool

Whether to compute and plot a UMAP embedding of the raw data.

True
UMAP_neighbors int

Number of neighbors for UMAP.

15
UMAP_min_dist float

Minimum distance parameter for UMAP.

0.1
UMAP_metric str

Distance metric passed to UMAP.

'euclidean'

Returns:

Type Description
list[tuple[str, Figure]]

list[tuple[str, plt.Figure]]: Caption and figure pairs for the

list[tuple[str, Figure]]

generated PCA and optional UMAP plots.

clustering_analysis_UMAP(data, batch, covariates=None, variable_names=None, rep=None)

Perform UMAP dimensionality reduction and plot the embedding colored by batch and covariates.

Parameters:

Name Type Description Default
data

2D array-like (n_samples x n_features) input data for U

required
batch

1D array-like (n_samples,) batch labels for each sample.

required
covariates

Optional 2D array-like (n_samples x n_covariates

None
variable_names

Optional list of covariate names (if covariates provided).

None
rep

Optional report object with rep.log_plot(plt, caption=...) method for logging

None

plot_eigen_spectra_and_cumulative(score, batch, rep, max_components=50, caption_prefix='PC spectrum')

Compute per-batch variance along PCs (scree / cumulative) and log plots to the report.

Parameters:

Name Type Description Default
score ndarray

(n_samples, n_pcs) PCA score matrix returned by your PCA routine.

required
batch ndarray

(n_samples,) batch labels (numeric or strings).

required
rep

report object that has rep.log_plot(plt, caption=...) and rep.log_text(...)

required
max_components int

maximum number of PCs to visualise (keeps plots cheap).

50
caption_prefix str

prefix for plot captions.

'PC spectrum'

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing per-batch variance curves,

dict[str, Any]

per-batch fraction-of-variance curves, and the number of principal

dict[str, Any]

components used.

plot_covariance_frobenius(score, batch, rep, max_components=50, normalize=True, caption_prefix='Covariance comparison (PC space)')

Compute pairwise Frobenius norms of covariance differences between batches

Parameters:

Name Type Description Default
score ndarray

(n_samples, n_pcs) whole data matrix in real space

required
batch ndarray

(n_samples,) batch labels.

required
rep

report object (must support rep.log_plot and rep.log_text).

required
normalize bool

if True, divide pairwise norms by Frobenius norm of pooled covariance.

True
caption_prefix str

prefix for plot captions.

'Covariance comparison (PC space)'

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A dictionary containing the batch covariance matrices,

dict[str, Any]

pairwise Frobenius distances, optional normalized distances, and the

dict[str, Any]

number of principal components used.

mahalanobis_distance_plot(results, rep=None, annotate=True, figsize=(14, 5), cmap='viridis', show=False)

Plot Mahalanobis distances from (...) all on ONE figure: - Heatmap of pairwise RAW distances - Heatmap of pairwise RESIDUAL distances (if available) - Bar chart of centroid-to-global distances (raw vs residual)

Parameters:

Name Type Description Default
results dict

Output from MahalanobisDistance(...)

required
annotate bool

Write numeric values inside heatmap cells/bars.

True
figsize tuple

Matplotlib figure size.

(14, 5)
cmap str

Colormap for heatmaps.

'viridis'
show bool

If True, plt.show(); otherwise just return (fig, axes).

False

Returns:

Type Description
(fig, axes)

The matplotlib Figure and dict of axes.

KS_plot(ks_results, feature_names=None, rep=None, caption=None, show=False)

Plot detailed KS-test results for each comparison.

Parameters:

Name Type Description Default
ks_results dict

Output from KS_Test, keyed by tuples such as (batch, "overall") or (batch1, batch2).

required
feature_names list

Optional feature names for plot labels.

None
rep optional

Report object used to log plots instead of returning them.

None
caption str

Optional caption prefix.

None
show bool

Whether to display the generated figures.

False

Returns:

Name Type Description
Any Any

The report object if rep is provided; otherwise a list of

Any

(caption, figure) tuples.

plot_SubjectOrder(df, idp_col='IDP', time_a_col='TimeA', time_b_col='TimeB', rho_col='SpearmanRho', p_col='pValue', times_order=None, significance=0.05, ncols=2, figsize_per_plot=(4, 4), cmap='icefire', fmt='.2f', center=0, vmax_abs=None, limit_idps=None, sample_method='first', random_state=None, rep=None, show=False, combine_method='stouffer', p_correction='fdr_bh')

Extended version of your function that combines p-values across IDPs (for each time-pair) and across time-pairs (for each IDP) using either Stouffer (signed) or Fisher, and optionally applies multiple-testing correction (BH or bonferroni).

Notes
  • combine_method='stouffer' uses the sign from rho (mean rho across IDPs for that cell) to create signed z-scores from two-sided p-values.
  • combine_method='fisher' uses scipy.stats.combine_pvalues(method='fisher') and ignores sign.
  • p_correction operates separately for the timeƗtime summary matrix and for per-IDP combined p's.
  • For best statistical rigor with permutation-based tests, combining at the permutation-level (i.e. combining test stats per permutation and building an empirical null) is preferable.

build_style_registry(subjects, idps, subject_palette='tab10', idp_palette='Set2', subject_markers=None, idp_markers=None)

Build color and marker registries for subjects and IDPs.

Parameters:

Name Type Description Default
subjects

Subject identifiers that need plotting styles.

required
idps

IDP identifiers that need plotting styles.

required
subject_palette str

Matplotlib or seaborn palette name for subjects.

'tab10'
idp_palette str

Matplotlib or seaborn palette name for IDPs.

'Set2'
subject_markers list

Marker cycle for subjects.

None
idp_markers list

Marker cycle for IDPs.

None

Returns:

Type Description
dict[Any, tuple[Any, str]]

tuple[dict[Any, tuple[Any, str]], dict[Any, tuple[Any, str]]]: Subject

dict[Any, tuple[Any, str]]

and IDP style dictionaries mapping each identifier to a `(color,

tuple[dict[Any, tuple[Any, str]], dict[Any, tuple[Any, str]]]

marker)` pair.

plot_WithinSubjVar(df, subject_col='subject', idp_cols=None, subject_style=None, idp_style=None, limit_subjects=10, limit_idps_for_legend=10, figsize=(14, 6), point_size=60, jitter=0.08, savepath=None, rep=None, show=False)

Plots within-subject variability across IDPs, showing: A) Per-IDP distribution across subjects (boxplot + jittered points) B) Per-IDP mean across subjects (boxplot + points) C) Per-subject mean across IDPs (boxplot + points)

plot_MultivariateBatchDifference(df, batch_col='batch', value_col='mdval', avg_label='average_batch', figsize=(8, 6), sort_by_value=True, sort_rest_desc=True, value_format='{:.1f}', savepath=None, rep=None, show=False)

Horizontal bar chart with
  • average_batch always at the top
  • remaining batches optionally sorted by mdval
  • value labels rounded to 1 decimal

plot_MixedEffectsPart1(df, idp_col='IDP', metrics=None, plot_type='bar', idp_style=None, limit_idps=10, figsize=(14, 4), value_format_float='{:.1f}', value_format_int='{:d}', seed=None, savepath=None, rep=None, show=False, display='subplots', metric=None)

Plots one figure per metric (subplots) OR separate figures per metric.

  • display='subplots' : original behaviour, 1 fig with n_metrics subplots (returns Figure).
  • display='separate' : create one Figure per metric, return dict {metric: Figure}.
  • display='single' : create a single Figure for the metric named in metric, return Figure.

Defaults: metrics excludes 'anova_batches' by design (use AdditiveEffect_long for omnibus).

plot_MixedEffectsPart2(df, idp_col='IDP', fix_eff=('age', 'sex'), p_thr=0.05, effect_style=None, idp_order=None, figsize=(10, 4), marker_size=80, cap_width=0.03, linewidth=2.0, xtick_rotation=45, highlight_color='red', title=None, savepath=None, rep=None, show=False)

Plot fixed-effect estimates and confidence intervals across IDPs.

Parameters:

Name Type Description Default
df

DataFrame containing IDP rows and effect summary columns such as "{eff}_est", "{eff}_pval", "{eff}_ciL", and "{eff}_ciU".

required
idp_col

Column containing IDP names.

'IDP'
fix_eff

Iterable of fixed-effect names to plot.

('age', 'sex')
p_thr

P-value threshold used to highlight significant estimates.

0.05
effect_style

Optional mapping from effect name to (color, marker).

None
idp_order

Optional IDP display order. If None, the DataFrame order is used.

None
figsize

Figure size passed to Matplotlib.

(10, 4)
marker_size

Marker size for effect estimates.

80
cap_width

Horizontal half-width of the confidence interval caps.

0.03
linewidth

Line width for confidence intervals.

2.0
xtick_rotation

Rotation angle for x-axis labels.

45
highlight_color

Fill color used when p < p_thr.

'red'
title

Optional figure title.

None
savepath

Optional file path for saving the figure.

None
rep

Optional report object used to log the plot.

None
show bool

Whether to display the figure interactively.

False

Returns:

Type Description
Figure

tuple[plt.Figure, list[plt.Axes]]: The Matplotlib figure and the list of

list[Axes]

subplot axes in fix_eff order.

plot_AddMultEffects(dfs, feature_col='Feature', p_col='p-value', labels=None, p_thr=0.05, cmap='Reds', annot_fmt='{:.3g}', vmax_logp=10, figsize=(4, 8), show_colorbar=True, savepath=None, value_scale='p', rep=None, show=False, annot_fontsize=7.0, tick_fontsize=7.0, cbar_shrink=0.8, linewidths=0.5, square=False)

Plot matrix of p-values for features across one or more dfs.

value_scale
  • 'p' : heatmap colors and annotations show raw p-values (range 0..1).
  • 'logp' : heatmap colors and annotations show -log10(p). (Base 10) color scale is clipped at vmax_logp.
Colorbar
  • Shows the same scale as the heatmap.
  • Includes a tick marking p_thr (in 'p' mode) or -log10(p_thr) (in 'logp' mode). The tick label will indicate the p_thr value for clarity.

Other layout params control fonts / size.