DiagnosticReport

These are the main report functions for the library. The module includes both the standard single-dataset cross-sectional report and a multi-method comparison report, plus the longitudinal report.

`CrossSectionalDiagnosticResult` `dataclass`

Container for one method's diagnostics in a comparison report.

The comparison workflow fills this structure incrementally as each test succeeds or fails, then uses the collected fields to build the scorecard, summary advice, and per-method export files.

`covariate_to_numeric(covariates)`

Convert categorical covariates to numeric codes for downstream analyses.

Parameters:

Name	Type	Description	Default
`covariates`	`ndarray or DataFrame`	Covariate matrix with categorical variables.	required

Returns:

Type	Description
`ndarray \| None`	np.ndarray \| None: Covariates converted to a numeric array, or `None` if
`ndarray \| None`	no covariates were provided.

Notes

If covariates is a DataFrame, each categorical column is factorized independently. If covariates is a NumPy array, each categorical column is factorized independently. Numeric columns are left unchanged.

`validate_comparison_datasets(datasets, batch, covariates=None, feature_names=None)`

Validate and normalize the datasets used by the comparison report.

The function enforces a non-empty mapping of method name to 2D data array, checks that every method has the same shape, and validates that batch, covariate, and feature-name dimensions are compatible with the data.

`summarise_method_performance(results, scoring_config=None)`

Turn per-method diagnostics into a comparable scorecard.

The summary combines the extracted metrics into category-level scores for additive, multiplicative, linear-modelling, distributional, and PCA behaviour. Optional scoring configuration can reweight the metrics or mark specific metrics as higher-is-better.

`generate_comparison_advice(summary_df)`

Generate a short natural-language recommendation from the scorecard.

The advice selects the best overall method, identifies the strongest method for each diagnostic theme, and adds a short note when the diagnostics favor different methods in different domains.

`CrossSectionalReportMin(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest')`

Create a minimal cross-sectional diagnostic report for quick checks.

This version keeps the report lightweight by running a reduced subset of diagnostics and visualizations. For a more comprehensive analysis, use CrossSectionalReport.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	Data matrix (samples x features).	required
`batch`	`list or ndarray`	Batch labels for each sample.	required
`covariates`	`ndarray`	Covariate matrix (samples x covariates).	`None`
`covariate_names`	`list of str`	Names of covariates.	`None`
`save_data`	`bool`	Whether to save input data and results.	`True`
`save_data_name`	`str`	Filename for saved data.	`None`
`save_dir`	`str or PathLike`	Directory to save report and data.	`None`
`feature_names`	`list`	Names of features.	`None`
`report_name`	`str`	Name of the report file.	`None`
`SaveArtifacts`	`bool`	Whether to save intermediate artifacts.	`False`
`rep`	`StatsReporter`	Existing report object to use.	`None`
`show`	`bool`	Whether to display plots interactively.	`False`
`timestamped_reports`	`bool`	Whether to append a timestamp to the report filename.	`True`
`covariate_types`	`list`	Types of covariates (e.g., 'categorical', 'numeric').	`None`
`ratio_type`	`str`	Variance-ratio comparison mode passed to `Variance_Ratios`.	`'rest'`

Returns:

Name	Type	Description
`StatsReporter`	`StatsReporter`	The report object containing the generated figures, text,
	`StatsReporter`	and saved artifact references.

`CrossSectionalReport(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest', UMAP_embedding=True, UMAP_tuning='auto', Random_state=None)`

Create a full cross-sectional diagnostic report for batch effects.

The report combines summary text, statistical tests, and visualizations for mean, variance, covariance, clustering, and distributional differences across batches.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	Data matrix (samples x features).	required
`batch`	`list or ndarray`	Batch labels for each sample.	required
`covariates`	`ndarray`	Covariate matrix (samples x covariates).	`None`
`covariate_names`	`list of str`	Names of covariates.	`None`
`save_data`	`bool`	Whether to save input data and results.	`True`
`save_data_name`	`str`	Filename for saved data.	`None`
`save_dir`	`str or PathLike`	Directory to save report and data.	`None`
`feature_names`	`list`	Names of features.	`None`
`report_name`	`str`	Name of the report file.	`None`
`SaveArtifacts`	`bool`	Whether to save intermediate artifacts.	`False`
`rep`	`StatsReporter`	Existing report object to use.	`None`
`show`	`bool`	Whether to display plots interactively.	`False`
`timestamped_reports`	`bool`	Whether to append a timestamp to the report filename.	`True`
`covariate_types`	`list`	Types of covariates used by the report's numeric and categorical workflows.	`None`
`ratio_type`	`str`	Variance-ratio comparison mode passed to `Variance_Ratios`.	`'rest'`

Returns:

Name	Type	Description
`StatsReporter`	`StatsReporter`	The report object containing the generated narrative,
	`StatsReporter`	figures, and saved outputs.

Notes

covariate_types should align with covariate_names so the report can decide when to factorize categorical covariates and when to keep numeric covariates unchanged. If covariate_types is not provided, the function infers categorical versus numeric handling from the supplied data.

`CrossSectionalComparisonReport(datasets, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, feature_names=None, save_dir=None, report_name=None, include_raw=True, raw_name='Raw', scoring_config=None, rep=None, SaveArtifacts=False, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest', UMAP_embedding=True, UMAP_tuning='auto', plot_covariate_embeddings=True, allow_many_covariate_embeddings=False)`

Create a comparative diagnostic report for multiple harmonisation methods.

The comparison report runs the same diagnostic suite on each candidate dataset, then aggregates the resulting metrics into a method scorecard and a short recommendation summary. It is intended for side-by-side evaluation of raw and harmonised outputs that share the same sample order, batch labels, and optional covariates.

The report reuses the same per-method diagnostic pipeline as the single cross-sectional workflow through the following helpers:

validate_comparison_datasets: checks that all methods are compatible.
_run_single_method_diagnostics: runs the full diagnostic suite.
summarise_method_performance: builds the comparison scorecard.
generate_comparison_advice: turns the scorecard into a recommendation.
_save_comparison_results: exports per-method CSV artifacts.

Parameters:

Name	Type	Description	Default
`datasets`	`dict[str, ndarray]`	Mapping of method name to data matrix `(n_samples, n_features)`.	required
`batch`		Batch vector of length `n_samples`.	required
`covariates`		Optional covariate matrix `(n_samples, n_covariates)`.	`None`
`covariate_names`		Optional covariate names.	`None`
`save_data`	`bool`	Whether to save per-method per-test CSV outputs.	`True`
`save_data_name`	`str \| None`	Optional prefix to include in per-method saved CSV names.	`None`
`feature_names`		Optional feature names.	`None`
`save_dir`	`str \| PathLike \| None`	Directory for report and CSV outputs.	`None`
`report_name`	`str \| None`	HTML report name.	`None`
`scoring_config`	`dict \| None`	Optional scoring configuration.	`None`
`rep`		Optional existing `StatsReporter` instance.	`None`
`SaveArtifacts`	`bool`	Whether to save report artifacts.	`False`
`show`	`bool`	Whether to show plots interactively.	`False`
`timestamped_reports`	`bool`	Whether to timestamp the report filename.	`True`
`covariate_types`		Optional covariate type codes.	`None`
`ratio_type`	`str`	Variance-ratio mode.	`'rest'`

Returns:

Name	Type	Description
`StatsReporter`	`StatsReporter`	Report object containing method-wise diagnostics,
	`StatsReporter`	side-by-side plots, scorecard, and advice.

`LongitudinalReport(data, batch, subject_ids, timepoints, covariates=None, covariate_names=None, features=None, save_data=False, save_data_name=None, save_dir=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True)`

Create a diagnostic report for dataset differences across batches in longitudinal data.

Parameters:

Name	Type	Description	Default
`data`	`ndarray`	Data matrix (samples x features).	required
`batch`	`list or ndarray`	Batch labels for each sample.	required
`subject_ids`	`list or ndarray`	Subject IDs for each sample.	required
`covariates`	`ndarray`	Covariate matrix (samples x covariates).	`None`
`covariate_names`	`list of str`	Names of covariates.	`None`
`save_data`	`bool`	Whether to save input data and results.	`False`
`save_data_name`	`str`	Filename for saved data.	`None`
`save_dir`	`str or PathLike`	Directory to save report and data.	`None`
`report_name`	`str`	Name of the report file.	`None`
`SaveArtifacts`	`bool`	Whether to save intermediate artifacts.	`False`
`rep`	`StatsReporter`	Existing report object to use.	`None`
`show`	`bool`	Whether to display plots interactively.	`False`

Outputs

Generates an HTML report with diagnostic plots and statistics for longitudinal data. If save_data is True, also returns a dictionary and csv with input data and results. If SaveArtifacts is True, saves intermediate plots to save_dir.

Note: This function is designed for repeated data where we do not expect to see a longitudinal trent over time. If need arises, we will revise this to include an additional function where we would expect to see a longitudinal trend and want to test for that explicitly.

DiagnosticReport

CrossSectionalDiagnosticResult dataclass

covariate_to_numeric(covariates)

validate_comparison_datasets(datasets, batch, covariates=None, feature_names=None)

summarise_method_performance(results, scoring_config=None)

generate_comparison_advice(summary_df)

CrossSectionalReportMin(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest')

LongitudinalReport(data, batch, subject_ids, timepoints, covariates=None, covariate_names=None, features=None, save_data=False, save_data_name=None, save_dir=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True)

`CrossSectionalDiagnosticResult` `dataclass`

`covariate_to_numeric(covariates)`

`validate_comparison_datasets(datasets, batch, covariates=None, feature_names=None)`

`summarise_method_performance(results, scoring_config=None)`

`generate_comparison_advice(summary_df)`

`CrossSectionalReportMin(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest')`

`LongitudinalReport(data, batch, subject_ids, timepoints, covariates=None, covariate_names=None, features=None, save_data=False, save_data_name=None, save_dir=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True)`