Introduction

Analysis of spatial search strategies

The Rtrack package is for the analysis of tracking data, as obtained, for example, by the Barnes maze test. The primary motivation was to provide a fast and reproducible way of assigning spatial search strategies to the individual search paths. The idea of strategies in the Barnes maze was already put forward in Carol Barnes’ original publication (Barnes 1979). Others have since added further strategy classes and ideas (O’Leary, Savoie, and Brown 2011; Illouz et al. 2016). We have built on these ideas and established 9 strategy classes, which are detailed in the associated strategy description page.

Using Rtrack for the Barnes maze

The Rtrack package aims to be an easy-to-use interface for data management and analysis of spatial tracking data. This workflow describes analysis of data from the Barnes maze test. Although the Barnes maze is a very simple paradigm, experiments often accumulate many tracks each with associated metadata. Managing all these data can be overwhelming and lead to confusion and potential errors. Unfortunately, due to the potential complexity of the experiments and the range of computational abilities of the researchers (who are typically experimentalists with little background in programming or data analysis), many existing software solutions have not found their way into laboratory workflows. The Rtrack package is built on the popular and ubiquitous R platform which runs on all commonly-used operating systems, is free and open-source (so that the cost of licencing is not a limitation) and which integrates well into existing analysis environments. The data preparation involves exporting path data from the acquisition software used (outputs from a number of software platforms are supported and more are being added), defining the arenas in which the experiment was performed and creating a spreadsheet with information about each track. Once the data and information spreadsheet have been prepared, the actual analysis runs in very little time (processing for 1000 tracks takes under a minute on a typical modern computer) and the results are available in R for further analysis, or can be exported for analysis using other software if desired. Publication-quality graphs can be generated and exported directly from the functions in Rtrack. In addition, the raw path data can be saved to a standardised format to allow data sharing and to enhance reproducibility.

About the example experiment

The data used here have been artificially generated to provide a small example just for demonstration purposes. We have simulated an experiment with two groups of animals (‘Control’ and ‘Treatment’) that have been trained for 5 days with 3 trials per day. There is a goal reversal on day 4. A real experiment will typically be longer and may also include a probe trial. To see details on how reversals and probe trials are handled in Rtrack, have a look at the tutorial for the Morris water maze, which uses a real, more complex dataset.

Quick start example

In this example, an archived experiment (raw data that has been saved in the portable trackxf format) is read in directly from a URL. The experiment is reconstructed, strategies calculated and plotted for a visual overview. To explore the further functionality of the package, please work through the tutorial examples below.

experiment = Rtrack::read_experiment("https://rupertoverall.net/Rtrack/examples/Barnes_example.trackxf")
#>     Restoring archived experiment.
#>     Processing tracks.
Rtrack::plot_variable("path.length", experiment = experiment, factor = "Group",
    las = 1)

Preparing the input files

Arena descriptions

An ‘arena’ is a description of the Barnes maze and the recording parameters for each session. This means that any change in layout of the field (for example, if the camera position is moved) will require a separate arena file. The description files are simple and consist of only three lines:

  1. The experiment type. For a Barnes maze experiment, this will always be barnes.
  2. The time units. The units in which the timestamps are measured. Each x,y coordinate pair in the path data is associated with a timestamp. The frequency of measurement may be in the range of milliseconds, seconds, hours or even days—depending on the type of experiment. This can either be a text code (‘s’ = seconds, ‘h’ = hours etc.)1 or a conversion factor to seconds (1 = seconds, 0.0002777778 = hours (a second is 1/3600 of an hour) etc.).
  3. The bounds of the arena (i.e. the Barnes maze dimensions). This line has four components, each is separated by a space.
    1. The shape. For the Barnes maze, only circle is allowed.
    2. The x coordinate of the arena centre.
    3. The y coordinate of the arena centre.
    4. The radius of the pool.
  4. The size and positions of the holes. Ideally, these are all defined individually. However, it is also possible to specify the position of one hole as well as the number of holes and Rtrack can generate the hole zones automatically2.
  5. The goal dimensions. Shape, centre x, centre y, radius. This will be the same as one of the holes.
  6. The dimensions of the old goal. Shape, centre x, centre y, radius. This will be the same as one of the holes. The old goal is only for reversal trials and does not need to be defined for acquisition trials.

The units for the x and y coordinates do not need to be specified, but these must be the same units used in the raw track files.

For example, the arena description for the example file (‘Acquisition.txt’) is:

type = barnes

time.units = s

arena.bounds = circle 0 0 60

hole = circle 36.2 36.2 3
hole = circle 45.6 23.2 3
hole = circle 50.6  8.0 3
hole = circle 50.6 -8.0 3
hole = circle 45.6 -23.2 3
hole = circle 36.2 -36.2 3
hole = circle 23.3 -45.6 3
hole = circle 8.0 -50.6 3
hole = circle -8.0 -50.6 3
hole = circle -23.2 -45.6 3
hole = circle -36.2 -36.2 3
hole = circle -45.6 -23.2 3
hole = circle -50.6 -8.0 3
hole = circle -50.6  8.0 3
hole = circle -45.6 23.2 3
hole = circle -36.2 36.2 3
hole = circle -23.2 45.6 3
hole = circle -8.0 50.6 3
hole = circle 8.0 50.6 3
hole = circle 23.3 45.6 3

goal = circle 36.2 36.2  3

Definition of the arena zones.

Assembling the experiment description

The key task before analysing an experiment is to gather together all the information you need for the analysis. This is always necessary for any analysis, and is always a nasty task. Nevertheless, Rtrack uses a straightforward spreadsheet format to make this task less tedious and less confusing.

Several columns are required, these all must begin with an underscore ’_’:

  • _TrackID is a unique identifier for each track. The easiest way to do this is just write “Track_1” in the first cell and drag to fill the whole column using Excel’s autofill feature.
  • _TargetID is a unique identifier for each subject. Here you should put the animal ID tags, blinded patient IDs or whatever identifies the subjects.
  • _Day indicates the day of the experiment. Use ordinal numbers for the days (e.g. 1 for the first experimental day).
  • _Trial indicates the trial number. Typically there will be multiple trials per day, but this is not necessary. The field is still required even if a one-trial-per-day paradigm is used. These also need to be numbers.
  • _Arena is the name of the arena description file that applies to this track. This is a file path and is relative to the project directory (which is defined by project.dir in the read_experiment function. See the note on relative paths below.
  • _TrackFile is the name of the arena description file that applies to this track. This is also a file path and is relative to the data directory (which is defined by data.dir in the read_experiment function. See the note on relative paths below.
  • _TrackFileFormat is the format in which the raw track data is stored. See the package documentation (run ?Rtrack::identify_track_format) for a list of the supported file formats.

You can also add any other columns of factors. In the example, there is a factor ‘Group’, which records the type of housing the animals were in.

An excerpt from an example Barnes maze experiment description spreadsheet. 

Note: Relative paths

If your analysis will be done in the same directory as the raw data files are in, then you can ignore this comment. If, however, your raw data are large, you may have them stored on an external disc or network volume. By specifying the data.dir parameter, you can keep these raw data anywhere you like and even move them without having to update the experiment description spreadsheet. All file paths in the experiment description are relative to the data.dir directory.

Process a single track

Reading in data

Look at one track to get a feel for the workflow. Firstly an arena definition must be read in. The resulting object has the class Rtrack_arena.

arena = Rtrack::read_arena("Barnes_example/Acquisition.txt")

There are many different raw data formats. The format of the data files depends on the software they were recorded with, the locale and (sometimes) the computer system they were recorded with. Each format supported by Rtrack has a code, which must be given to the read_path function. Run the function identify_track_format with one of the raw track files to help you determine the appropriate format code for your data.

track.format = Rtrack::identify_track_format("Barnes_example/Data/Track_1.csv")
#> ✔ This track seems to be in the format 'raw.csv'.

The tracks for the example are in the format ethovision.xt.csv2, we need to pass this information on to the reader function. The arena is also required for reading in the path (to provide calibration information).

path = Rtrack::read_path("Barnes_example/Data/Track_1.csv", arena, id = "test",
    track.format = "raw.csv")

Extracting path metrics

The path (of class Rtrack_path) can now be used to collect a range of metrics. This results in a list of various secondary variables which can be used for plotting and analysis.

metrics = Rtrack::calculate_metrics(path, arena)

Plotting the path

The path (the coordinates of the animal in the arena during the experiment) can be plotted. This representation shows the path as a black line and some informative areas of the field (called ‘zones’ by Rtrack; the zones in an Barnes maze experiment are the holes, the goals, the annulus and the centre) in shades of blue.

Rtrack::plot_path(metrics)

Plotting a density heatmap

Paths can also be plotted as a density heatmap.

Rtrack::plot_density(metrics)

Feel free to play with the colours (just please don’t use a garish ‘rainbow’ scheme). The colour scales are best defined using the ‘colorRampPalette’ function.

Rtrack::plot_density(metrics, col = colorRampPalette(c("yellow", "orange", "red"))(256))

You can use any of the colour definitions provided in R, and reducing the number of colours in the palette gives a contour effect.

Rtrack::plot_density(metrics, col = colorRampPalette(c(rgb(1, 1, 0.2), "orange",
    "#703E3E"))(8))

Calling the strategy

The search strategy can be called using the rtrack_metrics object. For more information on spatial search strategies and the Barnes maze strategies defined in Rtrack, see the associated strategy description page. The default method uses a random forest model trained on several thousands of expert-called search paths.

strategy = Rtrack::call_strategy(metrics)

The resulting rtrack_strategy object contains various information; the actual strategy call can be found in the calls component. A confidence score is an indicator of how well the path fit the model (1 = perfect).

strategy$calls
#>   strategy   name confidence    1     2     3     4     5     6     7 8     9
#> 1        4 serial      0.433 0.01 0.001 0.428 0.433 0.083 0.018 0.001 0 0.026

Bulk processing a whole experiment

Usually, an experiment will consist of multiple subjects/animals and possibly more than one trial per subject. Running each track separately by hand as we have done above would be tedious and error-prone. Rtrack allows you to set up a batch processing workflow to make this task easier. A description of the experiment is filled out with all the required data and passed to to the read_experiment function to be processed automatically.

Reading in experiment data and metadata

The experiment information is read in using metadata in a spreadsheet. See ‘Preparing the input files’ above for details on how to properly construct this file. The raw data are read in, metrics calculated and returned in a list object of class Rtrack_experiment. This is the most processor-intensive part of the workflow and an experiment will typically consist of many hundreds of tracks. Depending on the size of the experiment and the speed of your computer, this step may take several minutes (a friendly progress bar will let you know if there is time for a coffee at this step—the software is fast though, so it may be an espresso!).

experiment = Rtrack::read_experiment("Barnes_example/Experiment.xlsx", data.dir = "Barnes_example/Data")
#>     Processing tracks.

Parallel processing

By default, processing the experiment will run as one single process3 but is trivial to parallelise this potentially time-consuming step (if perhaps you have run out of coffee). Rtrack version 2 will take care of parallelising the code and all you need to do is adjust the threads parameter. The simple option is to specify threads = 0, which tells Rtrack to use as much processing power as it can. Now try running the read_experiment code again and see if this makes a difference in processing time.

experiment = Rtrack::read_experiment("Barnes_example/Experiment.xlsx", data.dir = "Barnes_example/Data",
    threads = 0)

Bulk strategy calling

The strategies can then be called for each track. The strategy-calling methods also take lists of rtrack_metrics objects, or even the whole rtrack_experiment object.. The core strategy-calling method employs vectorised code and is quite fast.

strategies = Rtrack::call_strategy(experiment)

The resulting rtrack_strategy object contains, among other information, all the strategy calls combined in a data.frame.

head(strategies$calls)
#>         strategy   name confidence     1     2     3     4     5     6     7
#> Track_1        4 serial      0.433 0.010 0.001 0.428 0.433 0.083 0.018 0.001
#> Track_2        3 random      0.801 0.004 0.003 0.801 0.157 0.017 0.004 0.001
#> Track_3        3 random      0.930 0.008 0.039 0.930 0.018 0.004 0.000 0.000
#> Track_4        4 serial      0.677 0.015 0.001 0.254 0.677 0.016 0.005 0.000
#> Track_5        3 random      0.809 0.016 0.017 0.809 0.122 0.018 0.006 0.001
#> Track_6        3 random      0.817 0.006 0.005 0.817 0.132 0.029 0.005 0.001
#>             8     9
#> Track_1 0.000 0.026
#> Track_2 0.000 0.013
#> Track_3 0.000 0.001
#> Track_4 0.000 0.032
#> Track_5 0.001 0.010
#> Track_6 0.000 0.005

Analysis of selected metrics

Once the experiment object has been constructed, you can use this to start analysing the results. Individual metrics might be of interest for separate analysis; for example, the number of holes visited before finding the goal. The built-in plotting function allows you to quickly inspect your data and includes the ability to split the results by a grouping factor.

Rtrack::plot_variable("holes.before.goal", experiment = experiment, factor = "Group")
title(main = "Number of holes visited before the goal")

The ‘summary.variables’ element shows all the metrics available

experiment$summary.variables
#>  [1] "path.length"            "total.time"             "velocity"              
#>  [4] "immobility"             "distance.from.goal"     "distance.from.old.goal"
#>  [7] "roaming.entropy"        "holes.before.goal"      "holes.before.old.goal" 
#> [10] "latency.to.goal"        "latency.to.old.goal"    "time.in.centre.zone"   
#> [13] "time.in.annulus.zone"   "time.in.goal.zone"      "time.in.old.goal.zone" 
#> [16] "time.in.hole.vicinity"  "time.in.n.quadrant"     "time.in.e.quadrant"    
#> [19] "time.in.s.quadrant"     "time.in.w.quadrant"     "goal.crossings"        
#> [22] "old.goal.crossings"

Bulk density maps

It is also possible to create a density heatmap for many tracks together.

Rtrack::plot_density(experiment$metrics)
#> Warning in Rtrack::plot_density(experiment$metrics): Multiple arena definitions
#> have been used. A merged plot may not make sense.

If there are data from tracks using different arenas in the ‘metrics’ list, you will get a warning as the resulting plot almost certainly does not make sense.

To make a density plot from a subset of tracks, you can simply use the in-built methods in R to select the tracks you need and plot these separately.

acquisition.metrics = experiment$metrics[experiment$factors$`_Arena` == "Acquisition"]
reversal.metrics = experiment$metrics[experiment$factors$`_Arena` == "Reversal"]
par(mfrow = c(1, 2))  # Two plots side-by-side.
Rtrack::plot_density(acquisition.metrics, title = "Acquisition")
Rtrack::plot_density(reversal.metrics, title = "Reversal")

par(mfrow = c(1, 1))  # Reset to single-plot mode.

Strategy plots

The strategies can be visualised as contingency plots per trial. Here two plots are made, one for each group.

Rtrack::plot_strategies(strategies, experiment = experiment, factor = "Group")

The last function produced two plots. These can be best saved as a multi-page PDF.

pdf(file = "Results/MWM_Strategy plots.pdf", height = 4)
Rtrack::plot_strategies(strategies, experiment = experiment, factor = "Group")
dev.off()
#> agg_png 
#>       2

Plotting all paths

This next block of code produces a large PDF (one page for each track) including each of the paths and titled with the track ID and called strategy. This can be used to check the strategy against your visual interpretation of the path—did our software get it right?

pdf(file = "Results/Barnes_Strategy call confirmation.pdf", height = 4)
for (i in 1:length(experiment$metrics)) {
    Rtrack::plot_path(experiment$metrics[[i]], title = paste(experiment$metrics[[i]]$id,
        strategies$calls[i, "name"]))
}
dev.off()
#> agg_png 
#>       2

There are many paths that are very difficult to call (even for a human) and often people will have different interpretations of which strategy is appropriate. The machine-learning method might not always get it ‘right’—but it is consistent. And that can only help reproducibility. We would be interested to hear from you if Rtrack does not perform well with your data. It may be possible to extend the package to cope with a wider range of data sources. See the help page for details on how to contact us.

Thresholding call confidence

The machine-learning algorithm (call_strategy) never outputs a 0 or ‘unknown’ call. It will always assign a call based on the best match to the model. Sometimes these ‘best matches’ are actually rather poor. It may be of interest to use a confidence threshold and discard calls that are below a certain value. We have observed during testing that confidence scores above 0.4 are typically accurate and reproducible, but some paths may not reach this level. It is possible to perform the thresholding easily using the threshold_strategies function.

The thresholded experiment is a new rtrack_experiment object and contains all the same components. The following example uses a confidence threshold of 0.6 and checks the size of the resulting data.frame of strategy calls.

dim(Rtrack::threshold_strategies(strategies, 0.6)$calls)
#> [1] 81 12

It can be seen that 158 tracks remain at this threshold.

The remaining strategies can be plotted as a strategy plot. The missing values are simply shown as white background.

Rtrack::plot_strategies(Rtrack::threshold_strategies(strategies, 0.6), experiment = experiment,
    factor = "Group")

Exporting analysis results

As a data.frame

To get a data.frame containing all the experiment metadata, metrics and strategies for each track, it is possible to export the experiment results. The function export_results is really intended for saving to a file, but if no filename is given, then you get the data as a data.frame.

results = Rtrack::export_results(experiment)

Writing to file

The results can be written to file in any one of several formats. The format will be determined from the filename extension. The default, and most likely to be used in an experimental workflow, is the Excel ‘.xlsx’ format.

Rtrack::export_results(experiment, file = "Results/Barnes_results.xlsx")

Also supported are tab-delimited text (recommended for maximum portability; file extension can be any of ‘.tsv’, ‘txt’ or ‘.tab’) and comma-delimited values (‘.csv’, or ‘.csv2’ where decimal commas are needed). You can actually use any file extension, but it will be written in that case as tab-delimited text and you’ll get a warning.

Exporting strategies

The strategy calls can easily be bundled with the other results. Rtrack will make sure that the strategies are in the same order as the results and will pad the strategies with NA where appropriate in case they don’t match the tracks in the results data.frame.

Rtrack::export_results(experiment, strategies, file = "Results/Barnes_Results.xlsx")

An excerpt from an Excel file containing exported results. 

Exporting a subset of the results

It is also possible to only export some of the results. To do this, just specify the indices or names of the tracks you would like to export.

# Export just the data for the control animals.
control = experiment$factors$Group == "Control"
Rtrack::export_results(experiment, tracks = control, file = "Results/Barnes_results_control.xlsx")

Or you may wish to only export tracks with an above-threshold strategy call.

thresholded = Rtrack::threshold_strategies(strategies,
    0.6)
Rtrack::export_results(experiment, strategies, tracks = rownames(thresholded$calls),
    file = "Results/Barnes_ResultsThresholded.xlsx")

It is worthwhile noting that the order of the exported results is also determined by the order of the values given to the tracks parameter.

results = Rtrack::export_results(experiment)  # Get the results as a data frame.
ordered = order(results$path.length, decreasing = TRUE)  # Then sort by path length (highest to lowest).
Rtrack::export_results(experiment, tracks = ordered, file = "Results/Barnes_results_ordered.xlsx")

Saving the experiment

As an RData archive

The entire Rtrack_experiment object can easily be saved and reloaded into a later R session. The .RData format is a compressed version of the Rtrack_experiment object and requires very little space.

save(experiment, file = "Results/Barnes_experiment.RData")

Load the file again (not necessary in this session, but the following line demonstrates the command needed to read in the .RData file we just created).

load("Results/Barnes_experiment.RData")

As a standardised “trackxf” archive

We have also developed a format for saving the raw data in a way that it can be accessed by other software. This allows sharing with other people and archiving in a way that is more likely to be readable in the future. The command below will create a file with the extension .trackxf—you do not need to add the extension though (in fact it is better not to) as Rtrack will take care of naming the file correctly.

Rtrack::export_data(experiment, file = "Results/Barnes_Experiment")
#>     Creating trackxf archive.
#>     Compressing trackxf archive.

Data saved in this way can be read back into Rtrack using the read_experiment function (with the format trackxf, although Rtrack will work this out for you). Because only the raw data are saved in trackxf, recreating an experiment in this way will re-calculate all of the Rtrack-specific metrics.

recreated.experiment = Rtrack::read_experiment("Results/Barnes_Experiment.trackxf",
    threads = 0)
#>     Restoring archived experiment.
#>     Initialising cluster.
#>     Processing tracks using 8 threads.

This experiment object recreated from the saved trackxf file is (almost) identical to the original object. Only the export information will obviously be different.

# If we set 'export.note' back to empty, then the objects are the same.
recreated.experiment$info$export.note = experiment$info$export.note
all.equal(recreated.experiment, experiment)
#> [1] TRUE

References

Barnes, C A. 1979. “Memory Deficits Associated with Senescence: A Neurophysiological and Behavioral Study in the Rat.” Journal of Comparative and Physiological Psychology 93 (1): 74–104. https://doi.org/10.1037/h0077579.
Illouz, Tomer, Ravit Madar, Charlotte Clague, Kathleen J Griffioen, Yoram Louzoun, and Eitan Okun. 2016. “Unbiased Classification of Spatial Strategies in the Barnes Maze.” Bioinformatics 32 (21): 3314–20. https://doi.org/10.1093/bioinformatics/btw376.
O’Leary, Timothy P, Vicki Savoie, and Richard E Brown. 2011. “Learning, Memory and Search Strategies of Inbred Mouse Strains with Different Visual Abilities in the Barnes Maze.” Behavioural Brain Research 216 (2): 531–42. https://doi.org/10.1016/j.bbr.2010.08.030.

  1. The full range of supported codes is: ‘us’ or ‘micros’ for microseconds, ‘ms’ for milliseconds, ‘s’ for seconds, ‘min’ for minutes, ‘h’ for hours, ‘d’ for days and ‘y’ for years.↩︎

  2. The assumption is made that the holes are evenly spaced around the edge of the arena and are all equally distant from the centre. These assumptions may not be true for your Barnes maze, so please check against a photo or diagramme of your set-up. It is always better to define the holes individually to be sure you get the best analysis of your data.↩︎

  3. modern computers can multi-task and run several jobs side-by-side at the same time. These separate processes are called ‘threads’ in computer terminology. Running multiple parallel threads can make make optimal use of the multiple ‘cores’ of your CPU (the microchip at the heart of your computer) and allow programs to process data more quickly.↩︎