Skip to content

DiagnosticReport

These are the main functions for the library. The functions here will produce an end-to-end report of each result in the analysis, with an explanation of how to interpret each one.

covariate_to_numeric(covariates)

Convert categorical covariates to numeric codes for downstream analyses.

Parameters:

Name Type Description Default
covariates ndarray or DataFrame

Covariate matrix with categorical variables.

required

Returns:

Type Description
ndarray | None

np.ndarray | None: Covariates converted to a numeric array, or None if

ndarray | None

no covariates were provided.

Notes

If covariates is a DataFrame, each categorical column is factorized independently. If covariates is a NumPy array, each categorical column is factorized independently. Numeric columns are left unchanged.

CrossSectionalReportMin(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest')

Create a minimal cross-sectional diagnostic report for quick checks.

This version keeps the report lightweight by running a reduced subset of diagnostics and visualizations. For a more comprehensive analysis, use CrossSectionalReport.

Parameters:

Name Type Description Default
data ndarray

Data matrix (samples x features).

required
batch list or ndarray

Batch labels for each sample.

required
covariates ndarray

Covariate matrix (samples x covariates).

None
covariate_names list of str

Names of covariates.

None
save_data bool

Whether to save input data and results.

True
save_data_name str

Filename for saved data.

None
save_dir str or PathLike

Directory to save report and data.

None
feature_names list

Names of features.

None
report_name str

Name of the report file.

None
SaveArtifacts bool

Whether to save intermediate artifacts.

False
rep StatsReporter

Existing report object to use.

None
show bool

Whether to display plots interactively.

False
timestamped_reports bool

Whether to append a timestamp to the report filename.

True
covariate_types list

Types of covariates (e.g., 'categorical', 'numeric').

None
ratio_type str

Variance-ratio comparison mode passed to Variance_Ratios.

'rest'

Returns:

Name Type Description
StatsReporter StatsReporter

The report object containing the generated figures, text,

StatsReporter

and saved artifact references.

CrossSectionalReport(data, batch, covariates=None, covariate_names=None, save_data=True, save_data_name=None, save_dir=None, feature_names=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True, covariate_types=None, ratio_type='rest', UMAP_embedding=True, UMAP_tuning='auto')

Create a full cross-sectional diagnostic report for batch effects.

The report combines summary text, statistical tests, and visualizations for mean, variance, covariance, clustering, and distributional differences across batches.

Parameters:

Name Type Description Default
data ndarray

Data matrix (samples x features).

required
batch list or ndarray

Batch labels for each sample.

required
covariates ndarray

Covariate matrix (samples x covariates).

None
covariate_names list of str

Names of covariates.

None
save_data bool

Whether to save input data and results.

True
save_data_name str

Filename for saved data.

None
save_dir str or PathLike

Directory to save report and data.

None
feature_names list

Names of features.

None
report_name str

Name of the report file.

None
SaveArtifacts bool

Whether to save intermediate artifacts.

False
rep StatsReporter

Existing report object to use.

None
show bool

Whether to display plots interactively.

False
timestamped_reports bool

Whether to append a timestamp to the report filename.

True
covariate_types list

Types of covariates used by the report's numeric and categorical workflows.

None
ratio_type str

Variance-ratio comparison mode passed to Variance_Ratios.

'rest'

Returns:

Name Type Description
StatsReporter StatsReporter

The report object containing the generated narrative,

StatsReporter

figures, and saved outputs.

Notes

covariate_types should align with covariate_names so the report can decide when to factorize categorical covariates and when to keep numeric covariates unchanged. If covariate_types is not provided, the function infers categorical versus numeric handling from the supplied data.

LongitudinalReport(data, batch, subject_ids, timepoints, covariates=None, covariate_names=None, features=None, save_data=False, save_data_name=None, save_dir=None, report_name=None, SaveArtifacts=False, rep=None, show=False, timestamped_reports=True)

Create a diagnostic report for dataset differences across batches in longitudinal data.

Parameters:

Name Type Description Default
data ndarray

Data matrix (samples x features).

required
batch list or ndarray

Batch labels for each sample.

required
subject_ids list or ndarray

Subject IDs for each sample.

required
covariates ndarray

Covariate matrix (samples x covariates).

None
covariate_names list of str

Names of covariates.

None
save_data bool

Whether to save input data and results.

False
save_data_name str

Filename for saved data.

None
save_dir str or PathLike

Directory to save report and data.

None
report_name str

Name of the report file.

None
SaveArtifacts bool

Whether to save intermediate artifacts.

False
rep StatsReporter

Existing report object to use.

None
show bool

Whether to display plots interactively.

False
Outputs

Generates an HTML report with diagnostic plots and statistics for longitudinal data. If save_data is True, also returns a dictionary and csv with input data and results. If SaveArtifacts is True, saves intermediate plots to save_dir.

Note: This function is designed for repeated data where we do not expect to see a longitudinal trent over time. If need arises, we will revise this to include an additional function where we would expect to see a longitudinal trend and want to test for that explicitly.