ColonyTrack_data_description.Rmd
The colonytrack_data
object contains the raw and
preprocessed data along with various metadata elements describing the
experimental set-up and the layout of the cage. The raw data and metadata
files are read in by the read_data
function and all
timestamps are converted into the the Unix time format.
Based on the start/end of the days (which is defined by the ‘LightsOn’
time in the Events file), the data are split into days and the partial
days at the start and end of the experiment are padded with missing data
placeholders. It almost always makes no sense to analyse these padded
days, so they can be trimmed during later steps—but no data is discarded
by default.
The raw data consist of antenna contacts and contain the time of the
contact, the RFID tag of the animal and the ID of the RFID antenna where
the contact was registered. The duration of the contact (in
milliseconds) is also recorded. The raw.data
element simply
contains this information with the timestamp converted into Unix
time.
Next, the consecutive antenna contacts for each subject are used to
infer the cage in which it was at any time. If a subject is localised to
a tunnel, then it is considered to remain in the cage from which it
entered the tunnel—until it can be unequivocally located to a new cage.
This cage presence information is contained in the data
element, alongside the time at which the subject moved into the
specified cage. In the case of multiple contacts at the same antenna, it
is actually impossible to decide whether the subject moved
back-and-forth through the tunnel or just triggered the antenna
repeatedly from within the one cage (which is possible due to the
confined layout of the cage rack and the required antenna sensitivity).
For this reason, repeated antenna contacts are merged so that only
unequivocal cage transitions are recorded in the processed
data. Similarly, only pairs of (de-duplicated) antenna contacts that
correspond to a cage transition are retained in the processed data. The
information about which antenna pairs are informative comes from a cage
layout network derived from the Cage
layout metadata file and which is described in some more detail
below. The timestamp information is given across several columns in
different formats as described below.
The timestamp is given in seconds since the start of Unix time (00:00:00 UTC on 1 January 1970) and is also, for convenience, exploded into its date and hour components. Based on the light/dark cycle information provided in the Events metadata file, times are additionally converted to Zeitgeber time (ZT) where ZT0 is the time at which lights come on and ZT12 is when lights go off. A column of logical values specifying whether it is ‘Nighttime’ (lights off) or not is also included.
Because the Metrics calculation requires full 24-hour blocks of data,
the range of the raw timestamps are padded so that they start at ZT0
preceding the first recorded antenna contact and end at ZT24 (equivalent
to ZT0) following the last recorded antenna contact. The range of the
first and last antenna contacts is given by timestamp.range
in the info
element and the padded range by
padded.timestamp.range
. The days
(ZT0–ZT12,
when lights are on) and nights
(ZT12–ZT24, when the lights
are off) define the timestamp ranges of each 12-hour block.
The ColonyRack system consists of cage joined by tunnels and can thus
be represented by an undirected network. This, indeed, is exactly what
the ColonyTrack software does internally. This network is defined by the
Cage
layout description file and considers cages as vertices and
tunnels/RFID antennae as edges. Details of the cage network are stored
in the element layout
. Using the capabilities of the igraph
package, a network/graph is created Centrality.
A qc
element in the data object contains some quality
control information that might be useful for identifying problems in the
experiment. In some cases, an antenna contact may not be able to be
assigned to one of the subjects in the subject metadata file. Such
contacts are flagged as ‘dirty data’ and not used to generate the
processed data found in the data
element. The number of
such unassigned events is given in dirty.data.count
and
details in dirty.data
. These are usually just control lines
generated by the ColonyRack software or can arise from the use of test
tags (used, for example, to verify recording when setting up a new
experiment). They can, however, also rarely occur when a tag is misread.
If a very large number of ‘dirty’ reads are generated by the same
SubjectID, this may indicate an error in your Subject
metadata file—double-check that all of the subjects are correctly
entered and that the tags have not become mutated by spreadsheet
software.
As described above, only consecutive antenna contacts that span a
cage are retained in the data object. If a subject is not recorded for
whatever reason as it passes a reader, then the previous and next
contacts may not flank a cage as defined in the Cage
layout description file. In this case, the subject cannot be
unequivocally located to a cage and is considered to remain in the cage
it was last seen until its position can be unequivocally updated. Any
such antenna contacts are referred to as ‘non-trajectory reads’ and the
number of these that are encountered for each animal is given in the
qc
element under non.trajectory.read.count
.
Details on the timestamps and antenna identities are recorded under
non.trajectory.reads
.
In a few rare cases, an antenna may be triggered by the same subject
at sub-millisecond intervals. This means that the timestamps are
identical an is most likely due to the the subject resting in proximity
to the antenna. These duplicate reads are discarded and a count of their
occurrence is recorded under submillisecond.count
.
Below is an overview of the hierarchy of the
colonytrack_data
object together with the names and classes
of each component. Where the class is not part of the R base package,
the parent package is given in square brackets after the class name.
Round brackets indicate multiple elements (i.e. one for each subject or
time window) of the same structure.
data : colonytrack_data [ColonyTrack]
info : list
timestamp.range : numeric
padded.timestamp.range : numeric
days : colonytrack_windows [ColonyTrack]
id : character
start : numeric
end : numeric
nights : colonytrack_windows [ColonyTrack]
id : character
start : numeric
end : numeric
subjects : character
subject.info : data.frame
SubjectID : character
Tag : character
(optional additional user-defined columns)
processed : POSIXct
version : character
qc : list
dirty.data.count : integer
dirty.data : data.frame
Timestamp : numeric
SubjectID : character
ReaderID : character
Duration : integer
trajectory : list
(SubjectID) : list
non.trajectory.read.count : numeric
non.trajectory.reads : data.frame
Timestamp : numeric
Cage : character
ReaderPairID : character
submillisecond.count : numeric
data : list
(SubjectID) : data.frame
Timestamp : numeric
DateTime : POSIXct
Day : character
Hour : numeric
ZTDay : character
ZT : numeric
Nighttime : logical
Cage : character
raw.data : data.frame
Timestamp : numeric
TagID : character
SubjectID : character
ReaderID : character
Duration : integer
layout : list
cages : character
cage.quality : data.frame
Start : POSIXct
End : POSIXct
(CageID) : character
readers : character
transitions : data.frame
Reader1 : character
Reader1 : character
Cage : character
network : data.frame
Sort : integer
Source : character
SourceType : character
Link : character
Target : character
TargetType : character
graph : igraph [igraph]
shortest.paths : numeric matrix
centrality : numeric