Event file restructuring quickstart

This tutorial works through the process of restructuring event files using the HED event file remodeling tools. The tools, which are written in Python, are designed to be run on an entire dataset. This dataset can either be in BIDS (Brain Imaging Data Structure) format, or can consist of files in a directory tree. The later format is useful for restructuring that occurs early in the experimental process, for example, during the conversion from the experimental control software formats. In both cases, the event files are assumed to be in a tabular, tab-separated value format.

The tools can be run using a command-line script, called from a Jupyter notebook, or run using online tools. This quickstart covers the basic concepts of remodeling and develops some basic examples of how remodeling is used. See the File remodeling tools guide for detailed descriptions of the available operations.

What is event file restructuring?

Event files, which consist of identified time markers linked to the timeline of the experiment, provide a crucial link between what happens in the experiment and the experimental data.

Event files are often initially created using information in the logs files generated by the experiment’s presentation software or other control software. These event files are then used to identify portions of the data corresponding to particular points or blocks of data to be analyzed or compared.

Event file restructuring refers to creating, modifying, and reorganizing the event markers in tabular files in order to disambiguate or clarify the information for distribution and analysis. Restructuring can occur at several stages during the acquisition and processing of experimental data as shown in this schematic diagram:
schematic diagram.

In addition to restructuring during initial creation of the tabular event files, restructuring may be required when the event files do not conform to the requirements of a particular analysis. Thus, restructuring is an iterative process, which is supported by the HED remodeling tools for datasets with tabular event files.

The following table gives a summary of the tools available in the HED remodeling toolbox.

Summary of the HED remodeling operations for tabular files.

Category

Operation

Example use case

clean-up

remove_columns

Remove temporary columns created during restructuring.

remove_rows

Remove rows with a particular value in a specified column.

rename_columns

Make columns names consistent across a dataset.

reorder_columns

Make column order consistent across a dataset.

factor

factor_column

Extract factor vectors from a column of condition variables.

factor_hed_tags

Extract factor vectors from search queries of HED annotations.

factor_hed_type

Extract design matrices and/or condition variables.

restructure

merge_consecutive

Replace multiple consecutive events of the same type
with one event of longer duration.

number_groups

number_rows

remap_columns

Create m columns from values in n columns (for recoding).

split_event

Split trial-encoded rows into multiple events.

summarization

summarize_column_names

Summarize column names and order in the files.

summarize_column_values

Count the occurrences of the unique column values.

summarize_hed_type

Create a detailed summary of a HED in dataset
(used to automatically extract experimental designs).

The clean-up operations are used at various phases of restructuring to assure consistency across event files in the dataset.

The factor operations produce column vectors of the same length as the events file in order to encode condition variables, design matrices, or the results of other search criteria. See the HED conditions and design matrices for more information on factoring and analysis.

The restructure operations modify the way that event files represent events.

The summarization operations produce dataset-wide summaries of various aspects of the data.

More detailed information about the remodeling operations can be found in the File remodeling tools guide.

The remodeling process

Remodeling consists of applying a list of operations to an events file to restructure or modify the file in some way. The following diagram shows a schematic of the remodeling process.

Event remodeling process

Initially, the user creates a backup of the event files. This backup process is performed only once and the results are stored in the derivatives/remodeling/backups subdirectory of the dataset.

Restructuring applies a sequence of remodeling operations given in a JSON transformation file to produce a final result. The restructuring always proceeds by looking up each event file in the backup and applying the transformation to the backup before writing into the data.

The transformation file provides a record of the operations performed on the file starting with the original file. If the user detects a mistake in the transformation, he/she can correct the transformation file and rerun.

Usually, users will use the default backup, run the backup request once, and work from the original backup. However, user may also elect to create a named backup, use the backup as a checkpoint mechanism, and develop scripts that use the check-pointed versions as the starting point. This is useful if different versions of the events files are needed for different purposes.

JSON transformation files

The operations to restructure an event file are stored in a remodel file in JSON format. The file consists of a list of JSON dictionaries.

Basic remodel operation syntax

Each dictionary specifies a operation, a description of the purpose, and the operation parameters. The basic syntax of a remodeler operation is illustrated in the following operation to rename the trial_type column to event_type.

Example of a remodeler operation.

{ 
    "operation": "rename_columns",
    "description": "Rename a trial type column to more specific event_type",
    "parameters": {
        "column_mapping": {
            "trial_type": "event_type"
        },
        "ignore_missing": true
    }
}

Each remodeler operation has its own specific set of required parameters that can be found under File remodeling tools. For rename_columns, the required operations are column_mapping and ignore_missing. Some operations also have optional parameters.

Applying multiple remodel operations

In a remodeler transformation file one or more remodel operations should be provided in a list. These operations are performed by the remodeler in order. It is important to consider the order of the remodeler operations, since these operations are not commutative. In the example below the summary will be performed after the renaming, so the result reflects the new column names.

An example JSON remodeler file with multiple operations.

[
    { 
        "operation": "rename_columns",
        "description": "Rename a trial type column to more specific event_type.",
        "parameters": {
            "column_mapping": {
                "trial_type": "event_type"
            },
            "ignore_missing": true
        }
    },
    {
        "operation": "summarize_column_headers",
        "description": "Get column names across files to find any missing columns.",
        "parameters": {
            "summary_name": "Columns after remodeling",
            "summary_filename": "columns_after_remodel"
        }      
    }
]

By stacking operations you can make several changes to an event file, which is important because the changes are always applied to a copy of the original events backup. If you are planning new changes to the event file, note that you are always changing the original file, not a previously remodeled events.tsv.

More complex remodeling

This section discusses a complex example using the sub-0013_task-stopsignal_acq-seq_events.tsv events file of AOMIC-PIOP2 dataset available on OpenNeuro as ds002790. Here is an excerpt of the event file.

Excerpt from an event file from the stop-go task of AOMIC-PIOP2 (ds002790).

onset

duration

trial_type

stop_signal_delay

response_time

response_accuracy

response_hand

sex

0.0776

0.5083

go

n/a

0.565

correct

right

5.5774

0.5083

unsuccesful_stop

0.2

0.49

correct

right

female

9.5856

0.5084

go

n/a

0.45

correct

right

female

13.5939

0.5083

succesful_stop

0.2

n/a

n/a

n/a

female

17.1021

0.5083

unsuccesful_stop

0.25

0.633

correct

left

male

21.6103

0.5083

go

n/a

0.443

correct

left

male

This event file corresponds to a stop-signal experiment. Participants were presented with faces and had to decide the sex of the face by pressing a button with left or right hand. However, if a stop signal occurred before this selection, the participant was to refrain from responding.

Notice that the stop_signal_delay and response_time columns contain information events in addition about additional events that happened in the trial in addition to the go signal presentation. These events are encoded implicitly as offsets from the presentation of the go signal. Each row is the event file encodes the information for an entire trial rather than a single event. This strategy is known as trial-level encoding.

Our goal is to represent each of these events (go signal, stop signal, and response) in a separate row of the event file using the split_event restructuring operation. The following example shows the remodeling operations to perform the splitting.

Example of split_events operation for the AOMIC stop signal task.

[
    {
        "operation": "split_event",
        "description": "Split response event from trial event based on response_time column.",
        "parameters": {
            "anchor_column": "event_type",
            "new_events": {
                "response": {
                    "onset_source": ["response_time"],
                    "duration": [0],
                    "copy_columns": ["trial_type", "response_accuracy", "response_hand"]
                },
                "stop_signal": {
                    "onset_source": ["stop_signal_delay"],
                    "duration": [0.5],
                    "copy_columns": ["trial_type"]
                }
            },
            "remove_parent_event": false
        }    
    }
]

The example uses the split_event restructuring operation to convert this file from trial encoding to event encoding. In trial encoding each event marker (row in the event file) represents all the information in a single trial. Event markers such as the participant’s response key-press are encoded implicitly as an offset from the stimulus presentation. while event encoding includes event markers for each individual event within the trial.

From the Split event explanation under File remodeling tools we can read all the required parameters for the split_event operation from the example. The required parameters are anchor_column, new_events, remove_parent_event.

The anchor_column is the column we want to write the new event name in. In this case we are specifying new types of events, namely the stop signal, and the response, so we will add an anchor_column: event_type. Note that is also possible to choose an existing column as an anchor column. The new events will be in new rows, so nothing will be overwritten.

Next we have to specify the new events. This is the most complex part to fill in. Each new event has a name, which is a key in the new_events dictionary. For each of these keys corresponds to a dictionary value which is in turn a dictionary specifying the values of the following parameters.

  • onset_source

  • duration

  • copy_columns`

The onset_source specifies list indicating how to calculate the onset for the new event relative to the onset of the anchor event. The list contains any combination of column names and numerical values, which are evaluated and added to the anchor onset. Column names are evaluated to the values in the corresponding columns.

In our example, the response time and stop signal delay are calculated relative to the trial’s onset, so we only need to add the value from the respective column. Note that these new events do not exist for every trial. Rows where there was no stop signal have an n/a in the stop_signal_delay column. This is processed automatically, and remodeler does not create new events when any items in the onset_source list is missing.

The duration specifies the duration for the new events. The AOMIC data did not measure the durations of the button presses, so we set the duration of the response event to 0. The AOMIC data report indicates that the stop signal lasted 500 ms.

The copy columns can be used to transfer context information to the new events from the original parent event. We would like to transfer the response_accuracy and the response_hand information to the response event. We also transferred trial_type, because we have found that keeping general context information is useful for downstream analysis.

Last parameter for the split_event operation is the remove_parent_event. Sometimes split_event can be used to replace the original parent event. This is not the case here, since the original event still represents the stimulus presentation. When the original event is replaced by a new event however, it is possible for the remodeler to remove the parent event after creating the new events. Here we set remove_parent_event to false.

The final remodeling file can be found at: finished json remodeler

Remodeling file locations

The remodeling tools expect the full path for the JSON remodeling operation file to be given when the remodeling is executed. However, it is a good practice to include all remodeling files used with the dataset. The JSON remodeling operation files are usually located in the derivatives/remodeling/models subdirectory below the dataset root, and have file names ending in _rmdl.json.

The backups are always in the derivatives/remodeling/backups subdirectory under the dataset root. Summaries produced by the restructuring tools are located in derivatives/remodeling/summaries.

In the next section we will go over several ways to call the remodeler.

Using the remodeling tools

The remodeler can be called in a number of ways including using online tools and from the command line. The following sections explain various ways to use the available tools.

Online tools for debugging

Although the event restructuring tools are designed to be run on an entire dataset, you should consider working with a single event file during debugging. The HED online tools provide support for debugging your remodeling script and for seeing the effect of remodeling on a single event file before running on the entire dataset.

Currently, the remodeling tools are only available on the HED tools development server, but will soon move to the regular HED tools online tools server.

To use the online remodeling tools, navigate to the events page and select the Remodel file action. Browse to select the events file to be remodeled and the JSON remodel file containing the remodeling operations. The following screenshot shows these selections for the split event example of the previous section.

Remodeling tools online

Press the Process button to complete the action. If the remodeling script has errors, the result will be a downloaded text file with the errors identified. If the remodeling script is correct, the result will be an events file with the remodeling transformations applied. If the remodeling script contains summarization operations, the result will be a zip file with the modified events file and the summaries included.

If you are using one of the remodeling operations that relies on HED tags, you will also need to upload a suitable JSON sidecar file containing the HED annotations for the events file.

The command-line interface

After installing the remodeler you can it on a full BIDS dataset, or on any directory using the command-line interface using run_remodel_backup, run_remodel, and run_remodel_restore.

The first step is to call run_remodel_backup with the necessary arguments to create a You do this by calling one of three python scripts (run_remodel_backup, run_remodel, and run_remodel_restore) with the necessary arguments. A full overview of all arguments is available at File remodeling tools.

Command to run summary on AOMIC dataset.

python run_remodel.py /data/ds002790  /data/ds002790/derivatives/remodeling/models/AOMIC_summarize_rmdl.json \
-s .txt -x derivatives -b 

The run_remodel_backup is usually run only once for a dataset. It makes the baseline backup of the event files to assure that nothing will be lost. The remodeling always starts from the backup files. The main script is the run_remodel which executes a remodeling script and overwrites the events files using the corresponding backup files as the starting point. This script can be run multiple times without doing backups and restores, since it always starts with the backup. Finally, the run_remodel_restore overwrites the event files with their backups to restore the dataset to its original form.

The first argument of any of the scripts is the full path to the root directory of the dataset. and the path to the json remodeler file. Depending on the operations you run, there may be other necessary arguments.

From the command line it is possible to run summary operations. These operations do not return a modified event file but provide a summary of values found in all events.tsv files in a dataset.

If we want to run the split events operation we demonstrated earlier on the full AOMIC dataset, we might first want to check whether the response_time exists for all subjects. We can do this by running the summarize_column_headers operation.

First we prepare the remodeler json file again.

Split events remodeler json file for the AOMIC stop signal task.

[
    {
        "operation": "summarize_column_names",
        "description": "Summarize existing column header across entire AOMIC dataset.",
        "parameters": {
            "summary_name": "AOMIC_column_headers",
            "summary_filename": "AOMIC_column_headers"
            }    
    }
]

This simple summary does not require many input parameters. Like any summary it requires you to indicate the summary name and the filename to write the summary to.

Open your computer’s command-line interface. To run the summary we have to provide the following arguments:

  • data dir

  • model-path

  • -s, –save-formats

  • -b, –bids-format

  • -x, –exclude-dirs

The exact paths will look different on your computer but the full command-line call should look something like this:

Command to run summary on AOMIC dataset.

python run_remodel.py /data/ds002790  /data/ds002790/derivatives/remodeling/models/AOMIC_summarize_rmdl.json \
-s .txt -x derivatives -b 

The summaries will be written to /data/ds002790/derivatives/remodeling/summaries folder in text format. By default, the summary operations will return both.

The summary file list all different column combinations and for each combination, the files with those columns. Looking at the different column combinations you can see there are three, one for each task that was performed for this dataset. All event files for the stop signal task contain the stop_signal_delay column and the response_time column.

Now you can try out the split_events on the full dataset!

Jupyter notebooks for remodeling

Three Jupyter remodeling notebooks are available at Jupyter notebooks for remodeling.

These notebooks are wrappers that