PlotDiagnosticResults
A set of complimentary plotting functions for the diagnostic tests. These can be used outside of the reports if you only want one specific test. Most functions here will store a set of figures in a list for ease of adding to the HTML reports and for saving as local PNG images.
LMM_Diagnostics_Plot(results_df, feature_order='original', max_labels=50, include_delta_r2=True, include_status_summary=True)
Plot LMM diagnostics from Run_LMM_cross_sectional output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results_df
|
DataFrame
|
Output dataframe from Run_LMM_cross_sectional. |
required |
feature_order
|
str
|
'original' to preserve input order, 'sorted_icc' to sort by ICC. |
'original'
|
max_labels
|
int
|
Maximum number of x-axis labels to show before thinning them. |
50
|
include_delta_r2
|
bool
|
If True, add a delta_R2 plot. |
True
|
include_status_summary
|
bool
|
If True, add a status/notes summary plot. |
True
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, Figure]]
|
list[tuple[str, matplotlib.figure.Figure]]: Caption and figure pairs for |
list[tuple[str, Figure]]
|
the generated diagnostic plots. |
rep_plot_wrapper(func)
Decorator that
- optionally forces show=False (if the wrapped function supports it),
- intercepts and removes wrapper-only kwargs (rep, log_func, caption),
- logs returned figure(s) into rep via rep.log_plot(fig, caption) if rep provided,
- closes figures after logging to free memory.
Z_Score_Plot(data, batch, probablity_distribution=False, draw_PDF=True)
Plots the median centered Z-score data as a heatmap and as a histogram of all scores. Re-order by batch for better visualisaion in the heatmap, also plot batch seperators on heatmap. Args: data (np.ndarray): 2D array of Z-scored data (samples x features). Returns: None: Displays plot of Z-scored data and a histogram of the values on different axes.
Cohens_D_plot(cohens_d, pair_labels, df=None, *, rep=None, caption=None, show=False)
Plots Cohen's d effect sizes as a bar plot with histograms of the values. Args: cohens_d (np.ndarray): 2D array of Cohen's d values (num_pairs x num_features). pair_labels (list): List of labels for each pair of batches corresponding to rows in cohens_d. df (Optional[pd.DataFrame], optional): Optional DataFrame containing additional information. Defaults to None. rep (optional): Optional StatsReporter instance. Defaults to None. caption (Optional[str], optional): Optional caption for the plot. Defaults to None. show (bool, optional): Whether to display the plot. Defaults to False. Returns: plt.Figure: The generated plot figure.
variance_ratio_plot(variance_ratios, pair_labels, df=None, rep=None, show=False, caption=None)
Plots the explained variance ratio for each principal component as a bar plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variance_ratios
|
Sequence[float]
|
A sequence of explained variance ratios for each principal component. |
required |
pair_labels
|
list
|
List of labels for each pair of batches corresponding to rows in variance_ratios. |
required |
df
|
None
|
Placeholder for potential future use. Defaults to None. |
None
|
rep
|
optional
|
Optional StatsReporter instance. Defaults to None. |
None
|
caption
|
Optional[str]
|
Optional caption for the plot. Defaults to None. |
None
|
show
|
bool
|
Whether to display the plot. Defaults to False. |
False
|
Returns:
None: Displays plot of vario per feature and a histogram of the values on different axes.
Raises: ValueError: If variance_ratios is not a sequence of numbers.
PC_corr_plot(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False)
Generate PCA diagnostic plots and return a list of (caption, fig).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
PrincipleComponents
|
ndarray
|
2D array of shape (n_samples, n_components) containing PCA scores. |
required |
batch
|
ndarray or list
|
1D array or list of batch labels corresponding to each sample. |
required |
covariates
|
optional
|
Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None. |
None
|
variable_names
|
optional
|
Optional list of variable names for
covariates and batch. If supplied with covariates, the length should
match the number of covariate columns. If the first element is
|
None
|
PC_correlations
|
optional
|
Optional output from |
False
|
show
|
bool
|
Whether to display the figures interactively. |
False
|
cluster_batches
|
bool
|
Whether to add batch-clustering overlays to the PCA plots. |
False
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, Figure]]
|
list[tuple[str, plt.Figure]]: Caption and figure pairs for the PCA |
list[tuple[str, Figure]]
|
diagnostic plots. |
clustering_analysis_PCA(PrincipleComponents, batch, covariates=None, variable_names=None, PC_correlations=False, *, show=False, cluster_batches=False, UMAP_embedding=False, data=None)
Perform clustering analysis on PCA results and generate diagnostic plots. Args: PrincipleComponents (np.ndarray): 2D array of shape (n_samples, n_components) containing PCA scores. batch (np.ndarray or list): 1D array or list of batch labels corresponding to each sample. covariates (optional): Optional covariate data. Can be a DataFrame, structured array, or 2D array. Defaults to None. variable_names (optional): Optional list of variable names for covariates and batch. If covariates provided, should match number of covariate columns. If first element is 'batch', it will be used as batch column name. Defaults to None. Returns: List[Tuple[str, plt.Figure]]: A list of tuples containing captions and corresponding figures for the PCA diagnostic plots.
clustering_analysis_all(PrincipleComponents, data, batch, covariates=None, variable_names=None, show=False, UMAP_embedding=True, UMAP_neighbors=15, UMAP_min_dist=0.1, UMAP_metric='euclidean', UMAP_tuning='auto')
Perform clustering diagnostics in PCA and optional UMAP space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
PrincipleComponents
|
ndarray
|
PCA score matrix with shape
|
required |
data
|
ndarray
|
Original feature matrix used for the optional UMAP embedding. |
required |
batch
|
ndarray or list
|
Batch labels for each sample. |
required |
covariates
|
optional
|
Optional covariate data to color the plots. |
None
|
variable_names
|
optional
|
Optional names for the covariates and batch variables. |
None
|
show
|
bool
|
Whether to display the figures interactively. |
False
|
UMAP_embedding
|
bool
|
Whether to compute and plot a UMAP embedding of the raw data. |
True
|
UMAP_neighbors
|
int
|
Number of neighbors for UMAP. |
15
|
UMAP_min_dist
|
float
|
Minimum distance parameter for UMAP. |
0.1
|
UMAP_metric
|
str
|
Distance metric passed to UMAP. |
'euclidean'
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, Figure]]
|
list[tuple[str, plt.Figure]]: Caption and figure pairs for the |
list[tuple[str, Figure]]
|
generated PCA and optional UMAP plots. |
clustering_analysis_UMAP(data, batch, covariates=None, variable_names=None, rep=None)
Perform UMAP dimensionality reduction and plot the embedding colored by batch and covariates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
2D array-like (n_samples x n_features) input data for U |
required | |
batch
|
1D array-like (n_samples,) batch labels for each sample. |
required | |
covariates
|
Optional 2D array-like (n_samples x n_covariates |
None
|
|
variable_names
|
Optional list of covariate names (if covariates provided). |
None
|
|
rep
|
Optional report object with rep.log_plot(plt, caption=...) method for logging |
None
|
plot_eigen_spectra_and_cumulative(score, batch, rep, max_components=50, caption_prefix='PC spectrum')
Compute per-batch variance along PCs (scree / cumulative) and log plots to the report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
score
|
ndarray
|
(n_samples, n_pcs) PCA score matrix returned by your PCA routine. |
required |
batch
|
ndarray
|
(n_samples,) batch labels (numeric or strings). |
required |
rep
|
report object that has rep.log_plot(plt, caption=...) and rep.log_text(...) |
required | |
max_components
|
int
|
maximum number of PCs to visualise (keeps plots cheap). |
50
|
caption_prefix
|
str
|
prefix for plot captions. |
'PC spectrum'
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary containing per-batch variance curves, |
dict[str, Any]
|
per-batch fraction-of-variance curves, and the number of principal |
dict[str, Any]
|
components used. |
plot_covariance_frobenius(score, batch, rep, max_components=50, normalize=True, caption_prefix='Covariance comparison (PC space)')
Compute pairwise Frobenius norms of covariance differences between batches
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
score
|
ndarray
|
(n_samples, n_pcs) whole data matrix in real space |
required |
batch
|
ndarray
|
(n_samples,) batch labels. |
required |
rep
|
report object (must support rep.log_plot and rep.log_text). |
required | |
normalize
|
bool
|
if True, divide pairwise norms by Frobenius norm of pooled covariance. |
True
|
caption_prefix
|
str
|
prefix for plot captions. |
'Covariance comparison (PC space)'
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: A dictionary containing the batch covariance matrices, |
dict[str, Any]
|
pairwise Frobenius distances, optional normalized distances, and the |
dict[str, Any]
|
number of principal components used. |
mahalanobis_distance_plot(results, rep=None, annotate=True, figsize=(14, 5), cmap='viridis', show=False)
Plot Mahalanobis distances from (...) all on ONE figure: - Heatmap of pairwise RAW distances - Heatmap of pairwise RESIDUAL distances (if available) - Bar chart of centroid-to-global distances (raw vs residual)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
dict
|
Output from MahalanobisDistance(...) |
required |
annotate
|
bool
|
Write numeric values inside heatmap cells/bars. |
True
|
figsize
|
tuple
|
Matplotlib figure size. |
(14, 5)
|
cmap
|
str
|
Colormap for heatmaps. |
'viridis'
|
show
|
bool
|
If True, plt.show(); otherwise just return (fig, axes). |
False
|
Returns:
| Type | Description |
|---|---|
(fig, axes)
|
The matplotlib Figure and dict of axes. |
KS_plot(ks_results, feature_names=None, rep=None, caption=None, show=False)
Plot detailed KS-test results for each comparison.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ks_results
|
dict
|
Output from |
required |
feature_names
|
list
|
Optional feature names for plot labels. |
None
|
rep
|
optional
|
Report object used to log plots instead of returning them. |
None
|
caption
|
str
|
Optional caption prefix. |
None
|
show
|
bool
|
Whether to display the generated figures. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
The report object if |
Any
|
|
plot_SubjectOrder(df, idp_col='IDP', time_a_col='TimeA', time_b_col='TimeB', rho_col='SpearmanRho', p_col='pValue', times_order=None, significance=0.05, ncols=2, figsize_per_plot=(4, 4), cmap='icefire', fmt='.2f', center=0, vmax_abs=None, limit_idps=None, sample_method='first', random_state=None, rep=None, show=False, combine_method='stouffer', p_correction='fdr_bh')
Extended version of your function that combines p-values across IDPs (for each time-pair) and across time-pairs (for each IDP) using either Stouffer (signed) or Fisher, and optionally applies multiple-testing correction (BH or bonferroni).
Notes
- combine_method='stouffer' uses the sign from rho (mean rho across IDPs for that cell) to create signed z-scores from two-sided p-values.
- combine_method='fisher' uses scipy.stats.combine_pvalues(method='fisher') and ignores sign.
- p_correction operates separately for the timeĆtime summary matrix and for per-IDP combined p's.
- For best statistical rigor with permutation-based tests, combining at the permutation-level (i.e. combining test stats per permutation and building an empirical null) is preferable.
build_style_registry(subjects, idps, subject_palette='tab10', idp_palette='Set2', subject_markers=None, idp_markers=None)
Build color and marker registries for subjects and IDPs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subjects
|
Subject identifiers that need plotting styles. |
required | |
idps
|
IDP identifiers that need plotting styles. |
required | |
subject_palette
|
str
|
Matplotlib or seaborn palette name for subjects. |
'tab10'
|
idp_palette
|
str
|
Matplotlib or seaborn palette name for IDPs. |
'Set2'
|
subject_markers
|
list
|
Marker cycle for subjects. |
None
|
idp_markers
|
list
|
Marker cycle for IDPs. |
None
|
Returns:
| Type | Description |
|---|---|
dict[Any, tuple[Any, str]]
|
tuple[dict[Any, tuple[Any, str]], dict[Any, tuple[Any, str]]]: Subject |
dict[Any, tuple[Any, str]]
|
and IDP style dictionaries mapping each identifier to a `(color, |
tuple[dict[Any, tuple[Any, str]], dict[Any, tuple[Any, str]]]
|
marker)` pair. |
plot_WithinSubjVar(df, subject_col='subject', idp_cols=None, subject_style=None, idp_style=None, limit_subjects=10, limit_idps_for_legend=10, figsize=(14, 6), point_size=60, jitter=0.08, savepath=None, rep=None, show=False)
Plots within-subject variability across IDPs, showing: A) Per-IDP distribution across subjects (boxplot + jittered points) B) Per-IDP mean across subjects (boxplot + points) C) Per-subject mean across IDPs (boxplot + points)
plot_MultivariateBatchDifference(df, batch_col='batch', value_col='mdval', avg_label='average_batch', figsize=(8, 6), sort_by_value=True, sort_rest_desc=True, value_format='{:.1f}', savepath=None, rep=None, show=False)
plot_MixedEffectsPart1(df, idp_col='IDP', metrics=None, plot_type='bar', idp_style=None, limit_idps=10, figsize=(14, 4), value_format_float='{:.1f}', value_format_int='{:d}', seed=None, savepath=None, rep=None, show=False, display='subplots', metric=None)
Plots one figure per metric (subplots) OR separate figures per metric.
- display='subplots' : original behaviour, 1 fig with n_metrics subplots (returns Figure).
- display='separate' : create one Figure per metric, return dict {metric: Figure}.
- display='single' : create a single Figure for the metric named in
metric, return Figure.
Defaults: metrics excludes 'anova_batches' by design (use AdditiveEffect_long for omnibus).
plot_MixedEffectsPart2(df, idp_col='IDP', fix_eff=('age', 'sex'), p_thr=0.05, effect_style=None, idp_order=None, figsize=(10, 4), marker_size=80, cap_width=0.03, linewidth=2.0, xtick_rotation=45, highlight_color='red', title=None, savepath=None, rep=None, show=False)
Plot fixed-effect estimates and confidence intervals across IDPs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame containing |
required | |
idp_col
|
Column containing IDP names. |
'IDP'
|
|
fix_eff
|
Iterable of fixed-effect names to plot. |
('age', 'sex')
|
|
p_thr
|
P-value threshold used to highlight significant estimates. |
0.05
|
|
effect_style
|
Optional mapping from effect name to |
None
|
|
idp_order
|
Optional IDP display order. If |
None
|
|
figsize
|
Figure size passed to Matplotlib. |
(10, 4)
|
|
marker_size
|
Marker size for effect estimates. |
80
|
|
cap_width
|
Horizontal half-width of the confidence interval caps. |
0.03
|
|
linewidth
|
Line width for confidence intervals. |
2.0
|
|
xtick_rotation
|
Rotation angle for x-axis labels. |
45
|
|
highlight_color
|
Fill color used when |
'red'
|
|
title
|
Optional figure title. |
None
|
|
savepath
|
Optional file path for saving the figure. |
None
|
|
rep
|
Optional report object used to log the plot. |
None
|
|
show
|
bool
|
Whether to display the figure interactively. |
False
|
Returns:
| Type | Description |
|---|---|
Figure
|
tuple[plt.Figure, list[plt.Axes]]: The Matplotlib figure and the list of |
list[Axes]
|
subplot axes in |
plot_AddMultEffects(dfs, feature_col='Feature', p_col='p-value', labels=None, p_thr=0.05, cmap='Reds', annot_fmt='{:.3g}', vmax_logp=10, figsize=(4, 8), show_colorbar=True, savepath=None, value_scale='p', rep=None, show=False, annot_fontsize=7.0, tick_fontsize=7.0, cbar_shrink=0.8, linewidths=0.5, square=False)
Plot matrix of p-values for features across one or more dfs.
value_scale
- 'p' : heatmap colors and annotations show raw p-values (range 0..1).
- 'logp' : heatmap colors and annotations show -log10(p). (Base 10)
color scale is clipped at
vmax_logp.
Other layout params control fonts / size.