DiagnosticReport
These are the main report functions for the library. The module includes both the standard single-dataset cross-sectional report and a multi-method comparison report, plus the longitudinal report.
CrossSectionalDiagnosticResult
dataclass
Container for one method's diagnostics in a comparison report.
The comparison workflow fills this structure incrementally as each test succeeds or fails, then uses the collected fields to build the scorecard, summary advice, and per-method export files.
covariate_to_numeric(covariates)
Convert categorical covariates to numeric codes for downstream analyses.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
covariates
|
ndarray or DataFrame
|
Covariate matrix with categorical variables. |
required |
Returns:
| Type | Description |
|---|---|
ndarray | None
|
np.ndarray | None: Covariates converted to a numeric array, or |
ndarray | None
|
no covariates were provided. |
Notes
If covariates is a DataFrame, each categorical column is factorized
independently.
If covariates is a NumPy array, each categorical column is factorized
independently.
Numeric columns are left unchanged.
validate_comparison_datasets(datasets, batch, covariates=None, feature_names=None)
Validate and normalize the datasets used by the comparison report.
The function enforces a non-empty mapping of method name to 2D data array, checks that every method has the same shape, and validates that batch, covariate, and feature-name dimensions are compatible with the data.
summarise_method_performance(results, scoring_config=None)
Turn per-method diagnostics into a comparable scorecard.
The summary combines the extracted metrics into category-level scores for additive, multiplicative, linear-modelling, distributional, and PCA behaviour. Optional scoring configuration can reweight the metrics or mark specific metrics as higher-is-better.
generate_comparison_advice(summary_df)
Generate a short natural-language recommendation from the scorecard.
The advice selects the best overall method, identifies the strongest method for each diagnostic theme, and adds a short note when the diagnostics favor different methods in different domains.
CrossSectionalReportMin(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest')
Create a minimal cross-sectional diagnostic report for quick checks.
This version keeps the report lightweight by running a reduced subset of
diagnostics and visualizations. For a more comprehensive analysis, use
CrossSectionalReport.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data matrix (samples x features). |
required |
batch
|
list or ndarray
|
Batch labels for each sample. |
required |
covariates
|
ndarray
|
Covariate matrix (samples x covariates). |
None
|
covariate_names
|
list of str
|
Names of covariates. |
None
|
save_data
|
bool
|
Whether to save input data and results. |
True
|
save_data_name
|
str
|
Filename for saved data. |
None
|
save_dir
|
str or PathLike
|
Directory to save report and data. |
None
|
feature_names
|
list
|
Names of features. |
None
|
report_name
|
str
|
Name of the report file. |
None
|
SaveArtifacts
|
bool
|
Whether to save intermediate artifacts. |
False
|
rep
|
StatsReporter
|
Existing report object to use. |
None
|
show
|
bool
|
Whether to display plots interactively. |
False
|
timestamped_reports
|
bool
|
Whether to append a timestamp to the report filename. |
True
|
covariate_types
|
list
|
Types of covariates (e.g., 'categorical', 'numeric'). |
None
|
ratio_type
|
str
|
Variance-ratio comparison mode passed to |
'rest'
|
Returns:
| Name | Type | Description |
|---|---|---|
StatsReporter |
StatsReporter
|
The report object containing the generated figures, text, |
StatsReporter
|
and saved artifact references. |
CrossSectionalReport(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest', UMAP_embedding=True, UMAP_tuning='auto', Random_state=None)
Create a full cross-sectional diagnostic report for batch effects.
The report combines summary text, statistical tests, and visualizations for mean, variance, covariance, clustering, and distributional differences across batches.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data matrix (samples x features). |
required |
batch
|
list or ndarray
|
Batch labels for each sample. |
required |
covariates
|
ndarray
|
Covariate matrix (samples x covariates). |
None
|
covariate_names
|
list of str
|
Names of covariates. |
None
|
save_data
|
bool
|
Whether to save input data and results. |
True
|
save_data_name
|
str
|
Filename for saved data. |
None
|
save_dir
|
str or PathLike
|
Directory to save report and data. |
None
|
feature_names
|
list
|
Names of features. |
None
|
report_name
|
str
|
Name of the report file. |
None
|
SaveArtifacts
|
bool
|
Whether to save intermediate artifacts. |
False
|
rep
|
StatsReporter
|
Existing report object to use. |
None
|
show
|
bool
|
Whether to display plots interactively. |
False
|
timestamped_reports
|
bool
|
Whether to append a timestamp to the report filename. |
True
|
covariate_types
|
list
|
Types of covariates used by the report's numeric and categorical workflows. |
None
|
ratio_type
|
str
|
Variance-ratio comparison mode passed to |
'rest'
|
Returns:
| Name | Type | Description |
|---|---|---|
StatsReporter |
StatsReporter
|
The report object containing the generated narrative, |
StatsReporter
|
figures, and saved outputs. |
Notes
covariate_types should align with covariate_names so the report can
decide when to factorize categorical covariates and when to keep numeric
covariates unchanged.
If covariate_types is not provided, the function infers categorical
versus numeric handling from the supplied data.
CrossSectionalComparisonReport(datasets, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, feature_names=None, save_dir=None, report_name=None, include_raw=True, raw_name='Raw', scoring_config=None, rep=None, SaveArtifacts=False, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest', UMAP_embedding=True, UMAP_tuning='auto', plot_covariate_embeddings=True, allow_many_covariate_embeddings=False)
Create a comparative diagnostic report for multiple harmonisation methods.
The comparison report runs the same diagnostic suite on each candidate dataset, then aggregates the resulting metrics into a method scorecard and a short recommendation summary. It is intended for side-by-side evaluation of raw and harmonised outputs that share the same sample order, batch labels, and optional covariates.
The report reuses the same per-method diagnostic pipeline as the single cross-sectional workflow through the following helpers:
validate_comparison_datasets: checks that all methods are compatible._run_single_method_diagnostics: runs the full diagnostic suite.summarise_method_performance: builds the comparison scorecard.generate_comparison_advice: turns the scorecard into a recommendation._save_comparison_results: exports per-method CSV artifacts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasets
|
dict[str, ndarray]
|
Mapping of method name to data matrix |
required |
batch
|
Batch vector of length |
required | |
covariates
|
Optional covariate matrix |
None
|
|
covariate_names
|
Optional covariate names. |
None
|
|
save_data
|
bool
|
Whether to save per-method per-test CSV outputs. |
True
|
save_data_name
|
str | None
|
Optional prefix to include in per-method saved CSV names. |
None
|
feature_names
|
Optional feature names. |
None
|
|
save_dir
|
str | PathLike | None
|
Directory for report and CSV outputs. |
None
|
report_name
|
str | None
|
HTML report name. |
None
|
scoring_config
|
dict | None
|
Optional scoring configuration. |
None
|
rep
|
Optional existing |
None
|
|
SaveArtifacts
|
bool
|
Whether to save report artifacts. |
False
|
show
|
bool
|
Whether to show plots interactively. |
False
|
timestamped_reports
|
bool
|
Whether to timestamp the report filename. |
True
|
covariate_types
|
Optional covariate type codes. |
None
|
|
ratio_type
|
str
|
Variance-ratio mode. |
'rest'
|
Returns:
| Name | Type | Description |
|---|---|---|
StatsReporter |
StatsReporter
|
Report object containing method-wise diagnostics, |
StatsReporter
|
side-by-side plots, scorecard, and advice. |
LongitudinalReport(data, batch, subject_ids, timepoints, covariates=None, covariate_names=None, features=None, save_data=False, save_data_name=None, save_dir=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True)
Create a diagnostic report for dataset differences across batches in longitudinal data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Data matrix (samples x features). |
required |
batch
|
list or ndarray
|
Batch labels for each sample. |
required |
subject_ids
|
list or ndarray
|
Subject IDs for each sample. |
required |
covariates
|
ndarray
|
Covariate matrix (samples x covariates). |
None
|
covariate_names
|
list of str
|
Names of covariates. |
None
|
save_data
|
bool
|
Whether to save input data and results. |
False
|
save_data_name
|
str
|
Filename for saved data. |
None
|
save_dir
|
str or PathLike
|
Directory to save report and data. |
None
|
report_name
|
str
|
Name of the report file. |
None
|
SaveArtifacts
|
bool
|
Whether to save intermediate artifacts. |
False
|
rep
|
StatsReporter
|
Existing report object to use. |
None
|
show
|
bool
|
Whether to display plots interactively. |
False
|
Outputs
Generates an HTML report with diagnostic plots and statistics for longitudinal data.
If save_data is True, also returns a dictionary and csv with input data and results.
If SaveArtifacts is True, saves intermediate plots to save_dir.
Note: This function is designed for repeated data where we do not expect to see a longitudinal trent over time. If need arises, we will revise this to include an additional function where we would expect to see a longitudinal trend and want to test for that explicitly.