PlotDiagnosticResults
A set of complimentary plotting functions for the diagnostic tests. These can be used outside of the reports if you only want one specific test. Most functions here will store a set of figures in a list for ease of adding to the HTML reports and for saving as local PNG images.
LMM_Diagnostics_Plot(results_df, feature_order='original', max_labels=50, include_delta_r2=True, include_status_summary=True)
Plot LMM diagnostics from Run_LMM_cross_sectional output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_df
|
DataFrame
|
Output dataframe from Run_LMM_cross_sectional. |
required |
feature_order
|
str
|
'original' to preserve input order, 'sorted_icc' to sort by ICC. |
'original'
|
max_labels
|
int
|
Maximum number of x-axis labels to show before thinning them. |
50
|
include_delta_r2
|
bool
|
If True, add a delta_R2 plot. |
True
|
include_status_summary
|
bool
|
If True, add a status/notes summary plot. |
True
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, Figure]]
|
list[tuple[str, matplotlib.figure.Figure]]: Caption and figure pairs for |
list[tuple[str, Figure]]
|
the generated diagnostic plots. |
rep_plot_wrapper(func)
Decorator that
- optionally forces show=False (if the wrapped function supports it),
- intercepts and removes wrapper-only kwargs (rep, log_func, caption),
- logs returned figure(s) into rep via rep.log_plot(fig, caption) if rep provided,
- closes figures after logging to free memory.
Z_Score_Plot(data, batch, probablity_distribution=False, draw_PDF=True)
Plots the median centered Z-score data as a heatmap and as a histogram of all scores. Re-order by batch for better visualisaion in the heatmap, also plot batch seperators on heatmap. Args: data (np.ndarray): 2D array of Z-scored data (samples x features). Returns: None: Displays plot of Z-scored data and a histogram of the values on different axes.
Cohens_D_plot(cohens_d, pair_labels, df=None, *, rep=None, caption=None, show=False)
Plots Cohen's d effect sizes as a bar plot with histograms of the values. Args: cohens_d (np.ndarray): 2D array of Cohen's d values (num_pairs x num_features). pair_labels (list): List of labels for each pair of batches corresponding to rows in cohens_d. df (Optional[pd.DataFrame], optional): Optional DataFrame containing additional information. Defaults to None. rep (optional): Optional StatsReporter instance. Defaults to None. caption (Optional[str], optional): Optional caption for the plot. Defaults to None. show (bool, optional): Whether to display the plot. Defaults to False. Returns: plt.Figure: The generated plot figure.
Levenes_Test_with_residuals(levene_results_raw, levene_results_resid=None, feature_names=None, *, alpha=0.05, show=False, rep=None)
Plot raw and residualised Levene's test results side-by-side.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
levene_results_raw
|
dict
|
dict of raw Levene outputs. |
required |
levene_results_resid
|
dict | None
|
dict of Levene outputs after covariate residualisation. |
None
|
feature_names
|
list | None
|
optional list of feature names. |
None
|
alpha
|
float
|
significance threshold. |
0.05
|
Levenes_Test(levene_results, feature_names=None, *, alpha=0.05, show=False, rep=None)
Plot Levene's test results produced by DiagnosticFunctions.Levene_Test.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
levene_results
|
dict
|
dict keyed by comparison tuple (a,b) with values containing at least 'stat' and 'pvalue' arrays (per-feature). |
required |
feature_names
|
list | None
|
optional list of feature names (length = n_features). |
None
|
alpha
|
float
|
significance threshold to highlight features. |
0.05
|
rep
|
optional report object used by the wrapper. |
None
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, Figure]]
|
list of (caption, Figure) tuples. |
variance_ratio_plot(variance_ratios, pair_labels, df=None, rep=None, show=False, caption=None)
Plots the explained variance ratio for each principal component as a bar plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variance_ratios
|
Sequence[float]
|
A sequence of explained variance ratios for each principal component. |
required |
pair_labels
|
list
|
List of labels for each pair of batches corresponding to rows in variance_ratios. |
required |
df
|
None
|
Placeholder for potential future use. Defaults to None. |
None
|
rep
|
optional
|
Optional StatsReporter instance. Defaults to None. |
None
|
caption
|
Optional[str]
|
Optional caption for the plot. Defaults to None. |
None
|
show
|
bool
|
Whether to display the plot. Defaults to False. |
False
|
Returns:
None: Displays plot of vario per feature and a histogram of the values on different axes.
Raises: ValueError: If variance_ratios is not a sequence of numbers.
PC_corr_plot(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False)
Generate PCA diagnostic plots and return a list of (caption, fig).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
PrincipleComponents
|
ndarray
|
2D array of shape (n_samples, n_components) containing PCA scores. |
required |
batch
|
ndarray or list
|
1D array or list of batch labels corresponding to each sample. |
required |
covariates
|
optional
|
Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None. |
None
|
variable_names
|
optional
|
Optional list of variable names for
covariates and batch. If supplied with covariates, the length should
match the number of covariate columns. If the first element is
|
None
|
PC_correlations
|
optional
|
Optional output from |
False
|
show
|
bool
|
Whether to display the figures interactively. |
False
|
cluster_batches
|
bool
|
Whether to add batch-clustering overlays to the PCA plots. |
False
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, Figure]]
|
list[tuple[str, plt.Figure]]: Caption and figure pairs for the PCA |
list[tuple[str, Figure]]
|
diagnostic plots. |
clustering_analysis_PCA(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False, UMAP_embedding=False, data=None)
Perform clustering analysis on PCA results and generate diagnostic plots. Args: PrincipleComponents (np.ndarray): 2D array of shape (n_samples, n_components) containing PCA scores. batch (np.ndarray or list): 1D array or list of batch labels corresponding to each sample. covariates (optional): Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None. variable_names (optional): Optional list of variable names for covariates and batch. If covariates provided, should match number of covariate columns. If first element is 'batch', it will be used as batch column name. Defaults to None. Returns: List[Tuple[str, plt.Figure]]: A list of tuples containing captions and corresponding figures for the PCA diagnostic plots.
clustering_analysis_all(PrincipleComponents, data, batch, covariates=None, variable_names=None, show=False, UMAP_embedding=True, UMAP_neighbors=15, UMAP_min_dist=0.1, UMAP_metric='euclidean', UMAP_tuning='auto')
Perform clustering diagnostics in PCA and optional UMAP space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
PrincipleComponents
|
ndarray
|
PCA score matrix with shape
|
required |
data
|
ndarray
|
Original feature matrix used for the optional UMAP embedding. |
required |
batch
|
ndarray or list
|
Batch labels for each sample. |
required |
covariates
|
optional
|
Optional covariate data to color the plots. |
None
|
variable_names
|
optional
|
Optional names for the covariates and batch variables. |
None
|
show
|
bool
|
Whether to display the figures interactively. |
False
|
UMAP_embedding
|
bool
|
Whether to compute and plot a UMAP embedding of the raw data. |
True
|
UMAP_neighbors
|
int
|
Number of neighbors for UMAP. |
15
|
UMAP_min_dist
|
float
|
Minimum distance parameter for UMAP. |
0.1
|
UMAP_metric
|
str
|
Distance metric passed to UMAP. |
'euclidean'
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, Figure]]
|
list[tuple[str, plt.Figure]]: Caption and figure pairs for the |
list[tuple[str, Figure]]
|
generated PCA and optional UMAP plots. |
clustering_analysis_UMAP(data, batch, covariates=None, variable_names=None, rep=None)
Perform UMAP dimensionality reduction and plot the embedding colored by batch and covariates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
2D array-like (n_samples x n_features) input data for U |
required | |
batch
|
1D array-like (n_samples,) batch labels for each sample. |
required | |
covariates
|
Optional 2D array-like (n_samples x n_covariates |
None
|
|
variable_names
|
Optional list of covariate names (if covariates provided). |
None
|
|
rep
|
Optional report object with rep.log_plot(plt, caption=...) method for logging |
None
|
plot_eigen_spectra_and_cumulative(score, batch, rep, max_components=50, caption_prefix='PC spectrum')
Compute per-batch variance along PCs (scree / cumulative) and log plots to the report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
score
|
ndarray
|
(n_samples, n_pcs) PCA score matrix returned by your PCA routine. |
required |
batch
|
ndarray
|
(n_samples,) batch labels (numeric or strings). |
required |
rep
|
report object that has rep.log_plot(plt, caption=...) and rep.log_text(...) |
required | |
max_components
|
int
|
maximum number of PCs to visualise (keeps plots cheap). |
50
|
caption_prefix
|
str
|
prefix for plot captions. |
'PC spectrum'
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary containing per-batch variance curves, |
dict[str, Any]
|
per-batch fraction-of-variance curves, and the number of principal |
dict[str, Any]
|
components used. |
plot_covariance_frobenius(data, batch, rep, max_components=50, normalize=True, caption_prefix='Covariance comparison (PC space)')
Compute pairwise Frobenius norms of covariance differences between batches
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
score
|
(n_samples, n_pcs) whole data matrix in real space |
required | |
batch
|
ndarray
|
(n_samples,) batch labels. |
required |
rep
|
report object (must support rep.log_plot and rep.log_text). |
required | |
normalize
|
bool
|
if True, divide pairwise norms by Frobenius norm of pooled covariance. |
True
|
caption_prefix
|
str
|
prefix for plot captions. |
'Covariance comparison (PC space)'
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary containing the batch covariance matrices, |
dict[str, Any]
|
pairwise Frobenius distances, optional normalized distances, and the |
dict[str, Any]
|
number of principal components used. |
mahalanobis_distance_plot(results, rep=None, annotate=True, figsize=(14, 5), cmap='viridis', show=False)
Plot Mahalanobis distances from (...) all on ONE figure: - Heatmap of pairwise RAW distances - Heatmap of pairwise RESIDUAL distances (if available) - Bar chart of centroid-to-global distances (raw vs residual)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
dict
|
Output from MahalanobisDistance(...) |
required |
annotate
|
bool
|
Write numeric values inside heatmap cells/bars. |
True
|
figsize
|
tuple
|
Matplotlib figure size. |
(14, 5)
|
cmap
|
str
|
Colormap for heatmaps. |
'viridis'
|
show
|
bool
|
If True, plt.show(); otherwise just return (fig, axes). |
False
|
Returns:
| Type | Description |
|---|---|
(fig, axes)
|
The matplotlib Figure and dict of axes. |
KS_plot(ks_results, feature_names=None, rep=None, caption=None, show=False)
Plot detailed KS-test results for each comparison.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ks_results
|
dict
|
Output from |
required |
feature_names
|
list
|
Optional feature names for plot labels. |
None
|
rep
|
optional
|
Report object used to log plots instead of returning them. |
None
|
caption
|
str
|
Optional caption prefix. |
None
|
show
|
bool
|
Whether to display the generated figures. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
The report object if |
Any
|
|
apply_plot_theme()
Apply the shared visual theme to all subsequent Matplotlib/Seaborn plots.
build_style_registry(subjects, idps, subject_palette='tab10', idp_palette='Set2', subject_markers=None, idp_markers=None)
Build colour and marker registries for subjects and IDPs.
Returns
subject_style, idp_style : dicts mapping each identifier to (color, marker).
plot_RawIDPBoxplotsAcrossSites(df, batch_col='batch', subject_col='subject', idp_cols=None, site_order=None, ncols=2, figsize_per_panel=(6.0, 4.5), show_points=True, point_size=2.2, point_alpha=0.18, point_jitter=0.22, savepath=None, rep=None, show=False, site_threshold_for_horizontal=12, feature_display_limit=10, add_pca_summary=True)
Raw IDP distributions across sites/batches.
Behaviour
- If the number of sites is small, boxplots are shown vertically.
- If the number of sites is large, boxplots are shown horizontally.
- If there are many features, only the top features by site dispersion are shown.
- Optionally adds a PCA summary panel for the full feature set when truncated.
Notes
The batch/site order is deterministic: - if site_order is provided, it is used as-is - otherwise sites are ordered alphabetically This keeps colors and tick-label order consistent across raw and harmonised runs.
plot_SubjectOrder(df, idp_col='IDP', time_a_col='TimeA', time_b_col='TimeB', rho_col='SpearmanRho', p_col='pValue', times_order=None, significance=0.05, ncols=2, figsize_per_plot=(5, 5), cmap=STYLE.HEATMAP_CMAP, fmt='.1f', center=0, limit_idps=None, sample_method='first', random_state=None, rep=None, show=False, combine_method='stouffer', p_correction='fdr_bh')
Subject order consistency heatmaps with combined p-values.
Per-feature heatmaps show raw permutation p-values. Summary heatmaps show combined p-values corrected by p_correction.
plot_WithinSubjVar(df, subject_col='subject', idp_cols=None, subject_style=None, idp_style=None, limit_subjects=30, limit_idps_for_legend=30, figsize=(16, 13), savepath=None, rep=None, show=False, debug=False)
Plot within-subject variability summary across IDPs.
Panels
A — distribution across subjects for each IDP * few subjects: colored subject markers + subject legend * many subjects: black dots only, no subject legend
B — mean variability per IDP * few IDPs: colored IDP markers + IDP legend * many IDPs: black dots only, no IDP legend
C — top 10 subjects with highest average variability
plot_MultivariateBatchDifference(df, batch_col='batch', value_col='mdval', avg_label='average_batch', figsize=(8, 6), sort_by_value=True, sort_rest_desc=True, value_format='{:.1f}', savepath=None, rep=None, show=False)
Horizontal bar chart of Mahalanobis distances per batch. The average batch is pinned to the top; remaining batches are optionally sorted.
plot_MixedEffectsPart1(df, idp_col='IDP', metrics=None, plot_type='bar', idp_style=None, limit_idps=10, figsize=(4, 6), seed=None, savepath=None, rep=None, show=False, display='subplots', metric=None, show_pairwise_heatmap=True)
Plot ICC, residual batch significance counts, and WCV per IDP.
'subplots' (all metrics in one figure),
'separate' (one figure per metric),
'single' (one figure for metric).
plot_PairwiseSiteDifferencesHeatmap(df, idp_col='IDP', records_col='pairwise_site_tests', sig_key='sig_bonf', title='Pairwise site differences by feature', p_thr=0.05, figsize=(10, 8), top_n=20, max_full_pairs=50, rep=None, show=False)
Discrete binary heatmap: grey = not significant red = significant after Bonferroni correction
plot_MixedEffectsPart2(df, idp_col='IDP', fix_eff=('age', 'sex'), p_thr=0.05, effect_style=None, idp_order=None, marker_size=160, figsize=(9.0, 3.2), cap_width=0.03, linewidth=2.4, xtick_rotation=25, highlight_color='red', savepath=None, rep=None, show=False)
Fixed-effect estimates and 95% confidence intervals per IDP.
Significant estimates (p < p_thr) are filled red; non-significant are open. One covariate per row, compact layout.
plot_AddMultEffects(dfs, feature_col='Feature', p_col='p-value', labels=None, p_thr=0.05, cmap='viridis', annot_fmt='{:.1g}', vmax_logp=10, figsize=(10, 8), show_colorbar=True, savepath=None, value_scale='p', rep=None, show=False, annot_fontsize=STYLE.ANNOT_SIZE, tick_fontsize=STYLE.TICK_SIZE, cbar_shrink=0.2, linewidths=STYLE.HEATMAP_LINEWIDTHS, square=False)
Feature significance heatmap (one or multiple DataFrames).
'p' — raw p-values in [0, 1]
'logp' — -log10(p) clipped at vmax_logp