Skip to content

DiagnosticReport

These are the main report functions for the library. The module includes both the standard single-dataset cross-sectional report and a multi-method comparison report, plus the longitudinal report.

CrossSectionalDiagnosticResult dataclass

Container for one method's diagnostics in a comparison report.

The comparison workflow fills this structure incrementally as each test succeeds or fails, then uses the collected fields to build the scorecard, summary advice, and per-method export files.

covariate_to_numeric(covariates)

Convert categorical covariates to numeric codes for downstream analyses.

Parameters:

Name Type Description Default
covariates ndarray or DataFrame

Covariate matrix with categorical variables.

required

Returns:

Type Description
ndarray | None

np.ndarray | None: Covariates converted to a numeric array, or None if

ndarray | None

no covariates were provided.

Notes

If covariates is a DataFrame, each categorical column is factorized independently. If covariates is a NumPy array, each categorical column is factorized independently. Numeric columns are left unchanged.

validate_comparison_datasets(datasets, batch, covariates=None, feature_names=None)

Validate and normalize the datasets used by the comparison report.

The function enforces a non-empty mapping of method name to 2D data array, checks that every method has the same shape, and validates that batch, covariate, and feature-name dimensions are compatible with the data.

summarise_method_performance(results, scoring_config=None)

Turn per-method diagnostics into a comparable scorecard.

The summary combines the extracted metrics into category-level scores for additive, multiplicative, linear-modelling, distributional, and PCA behaviour. Optional scoring configuration can reweight the metrics or mark specific metrics as higher-is-better.

generate_comparison_advice(summary_df)

Generate a short natural-language recommendation from the scorecard.

The advice selects the best overall method, identifies the strongest method for each diagnostic theme, and adds a short note when the diagnostics favor different methods in different domains.

CrossSectionalReportMin(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest')

Create a minimal cross-sectional diagnostic report for quick checks.

This version keeps the report lightweight by running a reduced subset of diagnostics and visualizations. For a more comprehensive analysis, use CrossSectionalReport.

Parameters:

Name Type Description Default
data ndarray

Data matrix (samples x features).

required
batch list or ndarray

Batch labels for each sample.

required
covariates ndarray

Covariate matrix (samples x covariates).

None
covariate_names list of str

Names of covariates.

None
save_data bool

Whether to save input data and results.

True
save_data_name str

Filename for saved data.

None
save_dir str or PathLike

Directory to save report and data.

None
feature_names list

Names of features.

None
report_name str

Name of the report file.

None
SaveArtifacts bool

Whether to save intermediate artifacts.

False
rep StatsReporter

Existing report object to use.

None
show bool

Whether to display plots interactively.

False
timestamped_reports bool

Whether to append a timestamp to the report filename.

True
covariate_types list

Types of covariates (e.g., 'categorical', 'numeric').

None
ratio_type str

Variance-ratio comparison mode passed to Variance_Ratios.

'rest'

Returns:

Name Type Description
StatsReporter StatsReporter

The report object containing the generated figures, text,

StatsReporter

and saved artifact references.

CrossSectionalReport(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest', UMAP_embedding=True, UMAP_tuning='auto', Random_state=None)

Create a full cross-sectional diagnostic report for batch effects.

The report combines summary text, statistical tests, and visualizations for mean, variance, covariance, clustering, and distributional differences across batches.

Parameters:

Name Type Description Default
data ndarray

Data matrix (samples x features).

required
batch list or ndarray

Batch labels for each sample.

required
covariates ndarray

Covariate matrix (samples x covariates).

None
covariate_names list of str

Names of covariates.

None
save_data bool

Whether to save input data and results.

True
save_data_name str

Filename for saved data.

None
save_dir str or PathLike

Directory to save report and data.

None
feature_names list

Names of features.

None
report_name str

Name of the report file.

None
SaveArtifacts bool

Whether to save intermediate artifacts.

False
rep StatsReporter

Existing report object to use.

None
show bool

Whether to display plots interactively.

False
timestamped_reports bool

Whether to append a timestamp to the report filename.

True
covariate_types list

Types of covariates used by the report's numeric and categorical workflows.

None
ratio_type str

Variance-ratio comparison mode passed to Variance_Ratios.

'rest'

Returns:

Name Type Description
StatsReporter StatsReporter

The report object containing the generated narrative,

StatsReporter

figures, and saved outputs.

Notes

covariate_types should align with covariate_names so the report can decide when to factorize categorical covariates and when to keep numeric covariates unchanged. If covariate_types is not provided, the function infers categorical versus numeric handling from the supplied data.

CrossSectionalComparisonReport(datasets, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, feature_names=None, save_dir=None, report_name=None, include_raw=True, raw_name='Raw', scoring_config=None, rep=None, SaveArtifacts=False, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest', UMAP_embedding=True, UMAP_tuning='auto', plot_covariate_embeddings=True, allow_many_covariate_embeddings=False)

Create a comparative diagnostic report for multiple harmonisation methods.

The comparison report runs the same diagnostic suite on each candidate dataset, then aggregates the resulting metrics into a method scorecard and a short recommendation summary. It is intended for side-by-side evaluation of raw and harmonised outputs that share the same sample order, batch labels, and optional covariates.

The report reuses the same per-method diagnostic pipeline as the single cross-sectional workflow through the following helpers:

  • validate_comparison_datasets: checks that all methods are compatible.
  • _run_single_method_diagnostics: runs the full diagnostic suite.
  • summarise_method_performance: builds the comparison scorecard.
  • generate_comparison_advice: turns the scorecard into a recommendation.
  • _save_comparison_results: exports per-method CSV artifacts.

Parameters:

Name Type Description Default
datasets dict[str, ndarray]

Mapping of method name to data matrix (n_samples, n_features).

required
batch

Batch vector of length n_samples.

required
covariates

Optional covariate matrix (n_samples, n_covariates).

None
covariate_names

Optional covariate names.

None
save_data bool

Whether to save per-method per-test CSV outputs.

True
save_data_name str | None

Optional prefix to include in per-method saved CSV names.

None
feature_names

Optional feature names.

None
save_dir str | PathLike | None

Directory for report and CSV outputs.

None
report_name str | None

HTML report name.

None
scoring_config dict | None

Optional scoring configuration.

None
rep

Optional existing StatsReporter instance.

None
SaveArtifacts bool

Whether to save report artifacts.

False
show bool

Whether to show plots interactively.

False
timestamped_reports bool

Whether to timestamp the report filename.

True
covariate_types

Optional covariate type codes.

None
ratio_type str

Variance-ratio mode.

'rest'

Returns:

Name Type Description
StatsReporter StatsReporter

Report object containing method-wise diagnostics,

StatsReporter

side-by-side plots, scorecard, and advice.

LongitudinalReport(data, batch, subject_ids, timepoints, covariates=None, covariate_names=None, features=None, save_data=False, save_data_name=None, save_dir=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True)

Create a diagnostic report for dataset differences across batches in longitudinal data.

Parameters:

Name Type Description Default
data ndarray

Data matrix (samples x features).

required
batch list or ndarray

Batch labels for each sample.

required
subject_ids list or ndarray

Subject IDs for each sample.

required
covariates ndarray

Covariate matrix (samples x covariates).

None
covariate_names list of str

Names of covariates.

None
save_data bool

Whether to save input data and results.

False
save_data_name str

Filename for saved data.

None
save_dir str or PathLike

Directory to save report and data.

None
report_name str

Name of the report file.

None
SaveArtifacts bool

Whether to save intermediate artifacts.

False
rep StatsReporter

Existing report object to use.

None
show bool

Whether to display plots interactively.

False
Outputs

Generates an HTML report with diagnostic plots and statistics for longitudinal data. If save_data is True, also returns a dictionary and csv with input data and results. If SaveArtifacts is True, saves intermediate plots to save_dir.

Note: This function is designed for repeated data where we do not expect to see a longitudinal trent over time. If need arises, we will revise this to include an additional function where we would expect to see a longitudinal trend and want to test for that explicitly.