Several input files are needed in order to provide the information for ColonyTrack to process your experiment. These files are described here in detail.

## Raw CSV files

These files are the raw output of the ColonyRack system. Depending on your locale settings, they will be in a comma-delimited or semicolon-delimited CSV format (with some additional header lines). These files should ideally all be in a single directory (folder) on your computer—but they do not need to be. This directory also does not need to be in the same place as the metadata files. This flexibility makes it easy to store the (often large) raw data on an external disc.

Note: the raw CSV files use the ‘UTF-16LE’ encoding. You should not need to know this unless, for whatever reason, you open/edit the files and re-save them (in which case your software may quietly re-encode them). If this happens, you will get an error from read_data and you will need to explicitly re-encode the files to ‘UTF-16LE’.

## Subject description file

Next, we will require a table describing all of the subjects. This takes the form of a tab-delimited TSV file. Minimally, this table needs two columns that must be headed ‘SubjectID’ and ‘Tag’.

The ‘SubjectID’ is a name chosen by you for each subject. It must be unique to each subject. We suggest not using the tag IDs (the unique RFID tag codes) as it often happens that the chips containing the RFID tags can fall out and need to be replaced. We also discourage the use of identifying information in the IDs—these should be randomised to enable effective blinding of your experiment. For a way to assign randomised IDs easily, you may wish to look at our helper tool idLabelR.

The ‘Tag’ column assigns the RFID tag (or tags) to each SubjectID. If the animal has been re-chipped, then simply add all of the different RFID tags separated by commas (without spaces). ColonyTrack will take care of merging the data for multiple tags into a single track for each subject. See the table below for an example of the format:

AnimalID Tag
XYZ-123-a 02DA4584B5E2
XYZ-456-b 000264969195,A2D6873C4DF8

Note: take care if you are preparing this file using spreadsheet software (such as Excel). Such software often strips leading zeroes and may interpret tags that are all numeric differently to those containing letters. This will result in your subjects not being recognised. If you need to use a spreadsheet, ensure that the content type of all cells is set to ‘text’.

Additional columns can be added that contain extra information about each subject (age, weight, experimental group etc.). Adding such columns is entirely optional (ColonyTrack does not use this information) but these additional metadata will be embedded in the Data and Metrics objects (see the data and metrics descriptions) and may make downstream analysis more convenient for you.

## Cage layout description file

Not only is the ColonyRack system available in custom sizes, but the connections between cages can be altered to enable flexible control of the environment. This means that we need to provide the ColonyTrack software with a description of the layout of the cages used in the current experiment. The format of this file is a simple network description consisting of at TSV file with minimally 5 columns: ‘Source’, ‘SourceType’, ‘Link’, ‘Target’, ‘TargetType’. The ‘Source’ and ‘Target’ columns contain names of the cages that are connected (you can choose these names, but you must obviously be consistent). The network is undirected, so once you add the Source:Target pair ‘Cage1-1:Cage1-2’, you do not need to also add the reverse ‘Cage1-2:Cage1-1’ link. The ‘SourceType’ and ‘TargetType’ columns define whether the ‘cage’ is a true cage or a tunnel linking the different ColonyRack levels. This is indicated by the type ‘Cage’ or ‘Tunnel’. Tunnels are treated differently by ColonyTrack, so this is important. The ‘Link’ column contains the ID of the antenna linking the two cages. These IDs must correspond to the IDs used in the raw data CSVs. The order in which the rows appear in this table is not important. The table below shows an example of the format for the network file:

Cage1-1 Cage RFID1-2 Cage1-2 Cage
Cage1-1 Cage RFID1-3 Cage1-3 Cage
Cage1-3 Cage RFID1-5 Tunnel1-1 Tunnel

A helper app is planned in the future to make the set-up of complex layouts more accessible.

## Events description file

This file contains a description of any ‘events’ that occurred during the experiment. Currently the only supported and required feature of this table is the light cycle. The file must contain 4 columns: ‘Start’, ‘End’, ‘Event’, ‘Value’. As an example, consider the following entry in the ‘Events’ file:

Start End Event Value
2020-02-01 05:00:00 UTC 2020-02-28 21:00:00 UTC LightsOn 07:00:00 Europe/Berlin

The columns ‘Start’ and ‘End’ contain timestamps in POSIX/Unix time format and these must contain the full timestamp details. These timestamps need to be in the format yyyy-mm-dd HH:MM:SS TZ where ‘y’, ‘m’, ‘d’ are years, months, days (always including leading zeroes) and ‘H’, ‘M’, ‘S’ are hours (in 24-hour format), minutes, seconds (also always including leading zeroes). ‘TZ’ indicates the timezone, which should be in the ‘Olson’ format (e.g. ‘Europe/Berlin’) or in the 3–4-letter abbreviated format (e.g. ‘UTC’). To work out the code for your timezone, this website might be helpful: https://www.zeitverschiebung.net. ColonyTrack will internally convert all timestamps into UTC (see below).

We strongly recommend using UTC (coordinated universal time) for your experimental work (and coordinating your light cycles with this). Time changes due to summertime/daylight saving wreak havoc on animals’ circadian clocks and cause difficult problems with analysis. It is really much easier to avoid all this unnecessary complexity.

The ‘LightsOn’ event is required and must be defined for the timeframes of all recorded antenna contacts. If you have raw data that is outside the Start–End period of the ‘LightsOn’ entry/entries, you will get a fatal error when attempting to process the data object.

The ‘Value’ for ‘LightsOn’ is in the same POSIX format as Start/End but without the date component. Again, the hours, minutes and seconds all need to be defined and a timezone must be provided. This is the time of day (in 24-hour format) when the lights come on in the room where the ColonyRack was housed.

Note: ColonyTrack currently only supports a 12:12 light cycle. That is, 12 hours light and 12 hours dark each day. This restriction is there to ensure interoperability of results between different experiments. If you have a compelling reason to use different light cycles, or would like to perform circadian studies with varying cycles, please feel free to reach out to the developers.

It is possible to define multiple light cycles by adding more rows to the description file. If any of the definitions overlap, then the entry later in the table has precedence.

## Cage ‘quality’ description file (optional)

The read_data function also accepts an optional file describing the ‘quality’ of each cage; i.e. whether it contains water, a feeder, toys or bedding. Different light intensities, local temperature etc. could also be recorded here. At present, this file is not used, but will be accessible for calculating experiment-specific metrics in a future software version. An example file is included with the dataset available on the downloads page for you to get an idea of the format. Timestamps are defined as described above for the Events description file. For now (version 1.0.4), however, this additional file can be ignored.

## Checklist

• Raw CSV data files. [Required]
• Subject information. [Required]
• Cage layout information. [Required]
• Events. [Required]
• Cage quality. [Optional, currently unsupported]