The colonytrack_data object contains the raw and preprocessed data along with various metadata elements describing the experimental set-up and the layout of the cage. The raw data and metadata files are read in by the read_data function and all timestamps are converted into the the Unix time format. Based on the start/end of the days (which is defined by the ‘LightsOn’ time in the Events file), the data are split into days and the partial days at the start and end of the experiment are padded with missing data placeholders. It almost always makes no sense to analyse these padded days, so they can be trimmed during later steps—but no data is discarded by default.

## Components of the data object

### Data processing

The raw data consist of antenna contacts and contain the time of the contact, the RFID tag of the animal and the ID of the RFID antenna where the contact was registered. The duration of the contact (in milliseconds) is also recorded. The raw.data element simply contains this information with the timestamp converted into Unix time.

Next, the consecutive antenna contacts for each subject are used to infer the cage in which it was at any time. If a subject is localised to a tunnel, then it is considered to remain in the cage from which it entered the tunnel—until it can be unequivocally located to a new cage. This cage presence information is contained in the data element, alongside the time at which the subject moved into the specified cage. In the case of multiple contacts at the same antenna, it is actually impossible to decide whether the subject moved back-and-forth through the tunnel or just triggered the antenna repeatedly from within the one cage (which is possible due to the confined layout of the cage rack and the required antenna sensitivity). For this reason, repeated antenna contacts are merged so that only unequivocal cage transitions are recorded in the processed data. Similarly, only pairs of (de-duplicated) antenna contacts that correspond to a cage transition are retained in the processed data. The information about which antenna pairs are informative comes from a cage layout network derived from the Cage layout metadata file and which is described in some more detail below. The timestamp information is given across several columns in different formats as described below.

### Timestamps

The timestamp is given in seconds since the start of Unix time (00:00:00 UTC on 1 January 1970) and is also, for convenience, exploded into its date and hour components. Based on the light/dark cycle information provided in the Events metadata file, times are additionally converted to Zeitgeber time (ZT) where ZT0 is the time at which lights come on and ZT12 is when lights go off. A column of logical values specifying whether it is ‘Nighttime’ (lights off) or not is also included.

### Time range

Because the Metrics calculation requires full 24-hour blocks of data, the range of the raw timestamps are padded so that they start at ZT0 preceding the first recorded antenna contact and end at ZT24 (equivalent to ZT0) following the last recorded antenna contact. The range of the first and last antenna contacts is given by timestamp.range in the info element and the padded range by padded.timestamp.range. The days (ZT0–ZT12, when lights are on) and nights (ZT12–ZT24, when the lights are off) define the timestamp ranges of each 12-hour block.

### The cage network

The ColonyRack system consists of cage joined by tunnels and can thus be represented by an undirected network. This, indeed, is exactly what the ColonyTrack software does internally. This network is defined by the Cage layout description file and considers cages as vertices and tunnels/RFID antennae as edges. Details of the cage network are stored in the element layout. Using the capabilities of the igraph package, a network/graph is created Centrality.

### Quality control

A qc element in the data object contains some quality control information that might be useful for identifying problems in the experiment. In some cases, an antenna contact may not be able to be assigned to one of the subjects in the subject metadata file. Such contacts are flagged as ‘dirty data’ and not used to generate the processed data found in the data element. The number of such unassigned events is given in dirty.data.count and details in dirty.data. These are usually just control lines generated by the ColonyRack software or can arise from the use of test tags (used, for example, to verify recording when setting up a new experiment). They can, however, also rarely occur when a tag is misread. If a very large number of ‘dirty’ reads are generated by the same SubjectID, this may indicate an error in your Subject metadata file—double-check that all of the subjects are correctly entered and that the tags have not become mutated by spreadsheet software.

As described above, only consecutive antenna contacts that span a cage are retained in the data object. If a subject is not recorded for whatever reason as it passes a reader, then the previous and next contacts may not flank a cage as defined in the Cage layout description file. In this case, the subject cannot be unequivocally located to a cage and is considered to remain in the cage it was last seen until its position can be unequivocally updated. Any such antenna contacts are referred to as ‘non-trajectory reads’ and the number of these that are encountered for each animal is given in the qc element under non.trajectory.read.count. Details on the timestamps and antenna identities are recorded under non.trajectory.reads.

In a few rare cases, an antenna may be triggered by the same subject at sub-millisecond intervals. This means that the timestamps are identical an is most likely due to the the subject resting in proximity to the antenna. These duplicate reads are discarded and a count of their occurrence is recorded under submillisecond.count.

## Structure of the data object

Below is an overview of the hierarchy of the colonytrack_data object together with the names and classes of each component. Where the class is not part of the R base package, the parent package is given in square brackets after the class name. Round brackets indicate multiple elements (i.e. one for each subject or time window) of the same structure.

data : colonytrack_data [ColonyTrack]
info : list
timestamp.range : numeric
days : colonytrack_windows [ColonyTrack]
id : character
start : numeric
end : numeric
nights : colonytrack_windows [ColonyTrack]
id : character
start : numeric
end : numeric
subjects : character
subject.info : data.frame
SubjectID : character
Tag : character
processed : POSIXct
version : character
qc : list
dirty.data.count : integer
dirty.data : data.frame
Timestamp : numeric
SubjectID : character
Duration : integer
trajectory : list
(SubjectID) : list
Timestamp : numeric
Cage : character
submillisecond.count : numeric
data : list
(SubjectID) : data.frame
Timestamp : numeric
DateTime : POSIXct
Day : character
Hour : numeric
ZTDay : character
ZT : numeric
Nighttime : logical
Cage : character
raw.data : data.frame
Timestamp : numeric
TagID : character
SubjectID : character
Duration : integer
layout : list
cages : character
cage.quality : data.frame
Start : POSIXct
End : POSIXct
(CageID) : character
transitions : data.frame
centrality : numeric