Batch analysis

Batch analysis#

miniML can be easily run on multiple traces/recordings in a batch. This feature is useful for large datasets, where running miniML on a single trace individually can take a long time.

The simplest approach is to run miniML in a loop over all recordings and save the results of each analysis to a file.

Tip

For batch analysis, we recommend to create a model instance outside of the loop and pass it to the miniML detection object. This will save time and memory.

In the following, we provide example code for the analysis of multiple recordings. You can set the verbose parameter to 0 to prevent output from being printed (recommended for large datasets).

import tensorflow as tf
from pathlib import Path

file_list = ['recording_1.dat', 'recording_2.dat', 'recording_3.dat']
raw_data_folder = 'data'
results_folder = 'results'

scaling = 1e12
unit = 'pA'
pgf_name = 'conti VC'
miniml_model = tf.keras.models.load_model('GC_lstm_model.h5')
window_size = 600

for filename in file_list:
    filepath = Path(folder_name) / filename
    # load data from file
    trace = MiniTrace.from_heka_file(filename=filepath,
                                     rectype=pgf_name,
                                     exclude_series=None,
                                     scaling=scaling,
                                     unit=unit)

    # create miniML detection object
    detection = EventDetection(data=trace,
                               model=miniml_model
                               model_threshold=0.5,
                               window_size=window_size,
                               batch_size=512,
                               event_direction='negative',
                               verbose=0)

    # run analysis
    detection.detect_events(eval=True,
                            peak_w=5,
                            rel_prom_cutoff=0.25,
                            convolve_win=20,
                            gradient_convolve_win=40)
    
    # Save results to file
    detection.save_to_pickle(filename=f'results_folder/{filepath.stem}.pickle', 
                             include_prediction=False, 
                             include_data=False)

Of course, you can also collect the results of the analysis in a custom object and use them for further analysis and/or saving to file. In the following example, we collect the results of the analysis in a pandas dataframe.

Hint

The miniML EventStats class (found in miniML.py) contains all the results of the analysis and includes methods to caculate, e.g., mean and median values.

import pandas as pd

# code from the above example goes here
#
#

# create empty dataframe
my_df = pd.DataFrame()

for i, filename in enumerate(file_list):

    # code as in the above example
    #
    #

    my_df.loc[i, 'recording'] = filename
    my_df.loc[i, 'recording_time'] = trace.data.shape[0] * trace.sampling,
    my_df.loc[i, 'amplitude_mean'] = detection.event_stats.mean(detection.event_stats.amplitudes)
    my_df.loc[i, 'amplitude_median'] = detection.event_stats.median(detection.event_stats.amplitudes)
    my_df.loc[i, 'charge_mean'] = detection.event_stats.mean(detection.event_stats.charges)
    my_df.loc[i, 'decay_mean'] = detection.event_stats.mean(detection.event_stats.halfdecays)
    my_df.loc[i, 'risetime_mean'] = detection.event_stats.mean(detection.event_stats.risetimes)
    my_df.loc[i, 'frequency'] = detection.event_stats.frequency()

my_df.to_csv('results.csv')

Important

When analysing many recordings, do not use the plotting functions in a loop, in particular if you are runnning the analysis in a jupyter notebook. Matplotlib will accumulate data in memory, causing severe slowdowns. Smaller batches should be used instead if plots are required.