Preprocessing module

Module containing some preprocessing helper and function to be applied to EEG…

Classes

From another module, we can use the multiway CCA as a preprocessing step. Also known as hyperalignment in the literature.

mCCA([n_components])

Class to support mCCA computation on a set of data matrices.

Directly in pyeeg.preprocess one can find also the two following classes:

MultichanWienerFilter([lags, low_rank, thresh])

This class implements a multichannel Wiener Filter for artifact removal.

WaveletTransform(*args, **kwargs)

Summary

create_filterbank(freqs, srate[, filtertype])

Creates a filter bank, by default of chebychev type 2 filters.

apply_filterbank(data, fbank[, filt_func, ...])

Applies a filterbank to a given multi-channel signal.

get_power(signals[, decibels, win, axis, n_jobs])

Compute the (log) power modulation of a signal by taking the smooth moving average of its square values.

covariance(X[, estimator])

Estimation of one covariance matrix on whole dataset.

covariances(X[, estimator])

Estimation of covariance matrices from a list of "trials".

covariances_extended(X, P[, estimator])

Special form covariance matrix where data are appended with another set.

Listing of all classes and functions

Module containing some preprocessing helper and function to be applied to EEG…

class pyeeg.preprocess.MultichanWienerFilter(lags=(0,), low_rank=False, thresh=None)

This class implements a multichannel Wiener Filter for artifact removal. The method is detailed in the reference paper A generic EEG artifact removal algorithm based on the multi-channel Wiener filter from Ben Somers et. al.

To correctly train the model, one must supply portions of contaminated data and clean data. This can be selected visually using the annotation tool from MNE for instance, or automatically by detecting above threshold values and considering this as bad portions. It is ok to have large windows around bad data segments, however the clean segments must be artifact free.

The model expects zero-mean data for both noisy and clean segments.

lags

Lags used for general model (NOT IMPLEMENTED YET)

Type:

list

low_rank

Whether to use low-rank approximation of covariance matrix for the artifactual data

Type:

bool

thresh

If int, this will correspond to the rank prior If float, it will be considered as the percent of variance to be kept

Type:

int or float

W_

Once fitted, contains the filter coefficients

Type:

ndarray

Example

TODO: Add code example

Example of result obtained (cleaning EOG artifact here):

../img/MWF_EOG_cleaning_example.png
fit(y_clean, y_artifact, cov_data=False)

Fit model to data.

Parameters:
  • y_clean (ndarray) – Clean segments

  • y_artifact (ndarray) – Artifact-contaminated segments

  • cov_data (bool) – Whether the input data are already covariance matrices estimate for each class

fit_transform(y_clean, y_artifact, x, cov_data=False)

Train the model on input and transform directly the data in x.

set_fit_request(*, cov_data: bool | None | str = '$UNCHANGED$', y_artifact: bool | None | str = '$UNCHANGED$', y_clean: bool | None | str = '$UNCHANGED$') MultichanWienerFilter

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • cov_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for cov_data parameter in fit.

  • y_artifact (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_artifact parameter in fit.

  • y_clean (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_clean parameter in fit.

Returns:

self – The updated object.

Return type:

object

transorm(x)

Filter the data to remove artifact learned by the model.

Parameters:

x (data) – EEG data

Return type:

Filtered EEG data

class pyeeg.preprocess.Whitener(axis=0, zca=False, bias=True)

A data whitener (via either PCA or ZCA).

fit_transform(X, y=None, axis=None)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

pyeeg.preprocess.apply_filterbank(data, fbank, filt_func=<function lfilter>, n_jobs=-1, axis=-1)

Applies a filterbank to a given multi-channel signal.

Parameters:
  • data (ndarray (samples, nchannels))

  • fb (list) – list of (b,a) tuples, where b and a specify a digital filter

Returns:

y

Return type:

ndarray (nfilters, samples, nchannels)

pyeeg.preprocess.covariance(X, estimator='cov')

Estimation of one covariance matrix on whole dataset. If X is of shape (trials, samples, channels) Will concatenate all trials together to compute a single covariance matrix across all of them.

pyeeg.preprocess.covariances(X, estimator='cov')

Estimation of covariance matrices from a list of “trials”.

Parameters:
  • X (array-like (ntrials, nsamples, nchannels) or list) – If list, each element can also have different number of samples, but the number of channels must be the same for all trials.

  • estimator (str) – One of covariance estimator from sklearn. See also _check_est().

Returns:

C – The list of covariance matrices for each trial.

Return type:

array-like (ntrials, nchannels, nchannels)

pyeeg.preprocess.covariances_extended(X, P, estimator='cov')

Special form covariance matrix where data are appended with another set. For instance, the data could be EEG data and the other set could be a set of idealised response (e.g. a clean ERP).

Notes

This assumes that the data are of shape (trials, samples, channels) and that the other set is of shape (samples, channels). The second set is typically an idealised response from the average across trials. The function could however also be called on a single trial, for continuous recordings for instance. In that case, the method used is to extend the data with the a dummy dimmension for the trials and for P convolve the idealise response to singular event with a series of impulses at the times of those events.

pyeeg.preprocess.create_filterbank(freqs, srate, filtertype=<function cheby2>, **kwargs)

Creates a filter bank, by default of chebychev type 2 filters. Parameters of filter are to be defined as name value pair arguments. Frequency bands are defined with boundaries instead of center frequencies.

pyeeg.preprocess.get_power(signals, decibels=False, win=125, axis=-1, n_jobs=-1)

Compute the (log) power modulation of a signal by taking the smooth moving average of its square values.

Parameters:
  • signals (ndarray (nsamples, nchans)) – Input signals

  • decibels (bool) – If True, will take the log power (default False).

  • win (int) – Length of smoothing window for moving average (default 125) in samples.

  • axis (int) – Axis on which to apply the transform

  • n_jobs (int) – Number of cores to be used (Parrallel job).

Returns:

out

Return type:

ndarray (nsamples, nchans)