HarmonisationFunctions
A set of harmonisation functions that are frequently used in the literature. We will continue to expand these with methods we develop.
Script containing self contained harmonisation functions that can be used in conjunction with the diagnostic tools:
combat_for_covbat(data, batch, model=None, numerical_covariates=None, eb=True)
Correct for batch effects in a dataset This function is a modified version of the ComBat harmonisation function that is used as part of the CovBat harmonisation process.
design_mat(mod, numerical_covariates, batch_levels)
Construct design matrix for ComBat, ensuring batch levels are in the correct order and handling numerical covariates.
aprior(delta_hat)
Calculate the aprior parameter for the inverse gamma distribution based on the method of moments.
bprior(delta_hat)
Calculate the bprior parameter for the inverse gamma distribution based on the method of moments.
postmean(g_hat, g_bar, n, d_star, t2)
Calculate the posterior mean for the batch effect parameters.
postvar(sum2, n, a, b)
Calculate the posterior variance for the batch effect parameters.
itSol(sdat_batch, gamma_hat, delta_hat, gamma_bar, t2, a, b, conv=0.001)
Iteratively solve for the posterior mean and variance of the batch effect parameters.
it_sol(sdat, g_hat, d_hat, g_bar, t2, a, b, conv=0.0001)
Iteratively solve for the posterior mean and variance of the batch effect parameters. This version is used by CovBat and taken from: https://github.com/andy1764/CovBat_Harmonisation Chen, A. A., Beer, J. C., Tustison, N. J., Cook, P. A., Shinohara, R. T., Shou, H., & Initiative, T. A. D. N. (2022). Mitigating site effects in covariance for machine learning in neuroimaging data. Human Brain Mapping, 43(4), 1179–1195. https://doi.org/10.1002/hbm.25688)
combat(data, batch, mod, parametric=True, DeltaCorrection=True, UseEB=True, ReferenceBatch=None, RegressCovariates=False, GammaCorrection=True, covbat_mode=False, return_priors=False)
Run ComBat harmonisation on the data and return the harmonized data.
This version accepts numpy arrays or pandas DataFrame/Series for data, batch, and mod. If a DataFrame is supplied, columns are treated as samples (so data.shape == (n_features, n_samples)). The function will auto-transpose data or mod if it detects that samples were provided as rows. The returned bayesdata is the same type as input data (DataFrame -> DataFrame, ndarray -> ndarray).
Note: helper functions aprior, bprior, itSol must be defined in scope.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
array or DataFrame
|
The data matrix to be harmonized, with shape (n_features, n_samples). |
required |
batch
|
array or Series
|
A vector of batch labels for each sample, with length n_samples. |
required |
mod
|
array or DataFrame
|
An optional design matrix of covariates to adjust for, with shape (n_samples, n_covariates). |
required |
parametric
|
bool
|
Whether to use parametric adjustments. Default is True. |
True
|
DeltaCorrection
|
bool
|
Whether to apply delta (scale) correction. Default is True. |
True
|
UseEB
|
bool
|
Whether to use empirical Bayes adjustments. Default is True. |
True
|
ReferenceBatch
|
str or int
|
If provided, the name or index of the reference batch to use for fitting priors. Default is None (no reference). |
None
|
RegressCovariates
|
bool
|
Whether to regress out covariate effects before harmonisation. Default is False. |
False
|
GammaCorrection
|
bool
|
Whether to apply gamma (mean) correction. Default is True. |
True
|
covbat_mode
|
bool
|
Whether to run in CovBat mode which includes additional covariance correction steps. Default is False. |
False
|
return_priors
|
bool
|
Whether to return the estimated parameters from the ComBat model along with the harmonized data. Default is False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
bayesdata |
array or DataFrame
|
The harmonized data, in the same format as the input data. |
priors |
(dict, optional)
|
A dictionary containing the estimated parameters from the ComBat model, including: - gamma_hat: raw batch effect mean estimates (n_batch, n_features) - delta_hat: raw batch effect variance estimates (n_batch, n_features) - gamma_star: empirical Bayes adjusted batch effect means (n_batch, n_features) - delta_star: empirical Bayes adjusted batch effect variances (n_batch, n_features) - gamma_bar: mean of gamma_hat across batches (n_batch,) - t2: variance of gamma_hat across batches (n_batch,) - a_prior: aprior parameters for each batch (n_batch,) - b_prior: bprior parameters for each batch (n_batch,) |
Note: If using this version of ComBat, please cite:
Jean-Philippe Fortin, Drew Parker, Birkan Tunc, Takanori Watanabe, Mark A Elliott, Kosha Ruparel, David R Roalf, Theodore D Satterthwaite, Ruben C Gur, Raquel E Gur, Robert T Schultz, Ragini Verma, Russell T Shinohara. Harmonisation Of Multi-Site Diffusion Tensor Imaging Data. NeuroImage, 161, 149-170, 2017 Jean-Philippe Fortin, Nicholas Cullen, Yvette I. Sheline, Warren D. Taylor, Irem Aselcioglu, Philip A. Cook, Phil Adams, Crystal Cooper, Maurizio Fava, Patrick J. McGrath, Melvin McInnis, Mary L. Phillips, Madhukar H. Trivedi, Myrna M. Weissman, Russell T. Shinohara. Harmonisation of cortical thickness measurements across scanners and sites. NeuroImage, 167, 104-120, 2018 W. Evan Johnson and Cheng Li, Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1):118-127, 2007.
covbat(data, batch, model=None, numerical_covariates=None, pct_var=0.95, n_pc=0)
Correction of Covariance Bat effects This function applies the ComBat harmonisation procedure to the data, then applies an additional covariance correction step via PCA and re-application of ComBat on the PCA scores. This method is based on the CovBat approach described in Chen et al. 2022.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
array or DataFrame
|
The data matrix to be harmonized, with shape (n_features, n_samples). |
required |
batch
|
array or Series
|
A vector of batch labels for each sample, with length n_samples. |
required |
model
|
DataFrame
|
An optional design matrix of covariates to adjust for, with shape (n_samples, n_covariates). Must include a column named "batch" with the batch labels. If None, a design matrix will be created with only the batch variable. Default is None. |
None
|
numerical_covariates
|
list of str or list of int
|
A list of column names or indices in the model design matrix that correspond to numerical covariates (as opposed to categorical). These covariates |
None
|
will be included in the design matrix but not treated as batch variables. Default is None (no numerical covariates).
If using this version of CovBat, please cite: Note: Chen, A. A., Beer, J. C., Tustison, N. J., Cook, P. A., Shinohara, R. T., Shou, H., & Initiative, T. A. D. N. (2022). Mitigating site effects in covariance for machine learning in neuroimaging data. Human Brain Mapping, 43(4), 1179–1195. https://doi.org/10.1002/hbm.25688)
lme_harmonisation(data, batch, mod, variable_names)
Fits a per feature linear mixed model to harmonize data across batches while adjusting for covariates. This function is an alternative to ComBat that uses mixed effects modeling to estimate and remove batch effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame or array
|
The data matrix to be harmonized, with shape (n_samples, n_features). |
required |
batch
|
Series or array
|
A vector of batch labels for each sample, with length n_samples. |
required |
mod
|
DataFrame or array
|
A design matrix of covariates to adjust for, with shape (n_samples, n_covariates). |
required |
variable_names
|
list of str
|
A list of column names corresponding to the covariates in |
required |
Returns:
np.array: A harmonized data matrix of the same shape as the input data, with batch effects removed according to the fitted mixed models.
Note
This function fits a separate linear mixed model for each feature (column) in the data matrix, with the batch variable as a random effect and the covariates as fixed effects. The residuals from these models are returned as the harmonized data.
lme_iqm_harmonisation(data, IQMs, mod, variable_names)
Place holder for future implementation of LME-IQM harmonisation. When batch labels not given, or when batch size is very small, IQMs may hold more information than batch labels.