Make sure to be familiar with the following tutorials before proceeding:
After the use of designExperiment()
the experiment design is now stored in the cypro-object called cypro_object_new
. You can’t do any analysis and visualization with it yet as it does not contain any loaded data. To load data use the loadData*()
- functions. There are two of them.
loadDataFile()
is needed if the data is already present in one single file. This is often the case if CellProfiler has been used as the image processing software. This data table must contain variables giving information about the well, the region of interest (ROI) and - if the experiment design contains more than one well plate - about the well plate.
loadDataFiles()
is needed if the data is present in multiple files. This is the case if CellTracker has been used as the image processing software. In this case the information about the well-ROI and the well plate is conveyed by the names of the file (see below for more information).
loadDataFiles()
again opens a user interface. It is split in two parts.
The cypro
package is very flexible regarding its input data. This flexibility comes with a small price: The name of important data variables such as x-coordinates, y-coordinates, Cell ID etc. must be single handedly denoted one by one. This is done during step 1 Prepare data loading.
Here you upload an example file that is representative of all the other files that you will read in later on. Once the file is loaded you see the first six rows printed right below.
The example file in this case is a track file derived from the image processing software Cell Tracker.
As explained in the terminology article cypro
requires its input data to contain data variables that contain information about the cell id and the frame. In case of data files generated with CellTracker the variable names are obvious but they might change from software to software.
Beyond the convenient possibility to read in huge amount of files according to specific experiment designs cypro
offers a variety of analysis functionalities. Step 3: Denote module specific variables is about telling cypro where to get the required information as well as which modules it is supposed to use. Some of them require specific data variables (needed variables) based on which other variables are computed (computable variables). Every select input option is provided with a blue question mark which displays information about the variable to be denoted.
For instance, the migration module needs information about x- and y-coordinates. If not specified additionally data variables such as Distance from origin or Instantaneous speed can be computed. While basically every image processing software outputs x- and y-coordinates of the tracked cells information about cell splits are not part of every software output. If your data files do not contain needed variables for a certain module make sure to ‘turn off’ the module with the slider below it. In this case the CellTracker files do not contain mitosis related data. Therefore we won’t be able to use the mitosis module.
Every additional data variable that is unknown to cypro
but contains information you want to include in downstream analysis steps such as clustering and correlation can be denoted here. CellTracker does not offer any additional variables apart from those already integrated in our migration module.
In case of output files from CellProfiler it could look somewhat like this.
Note: Data variables denoted in Step 3 Denote module variables can be used for correlation and clustering, too! Step 4 Denote additional variables simply lets you keep data variables that do not have an inherent meaning to cypro
while simultaneously lets you discard variables you won’t be interested in later on.
Every denoted variable name is saved as such. Based on the input from step 2, step 3, and step 4 the final data.frame as shown in Figure 7 is what cypro
expects all files you are about to read in to look like. (The order of the data variables does not matter. However, missing data variables will result in an error during the reading process. See below.)
Once cypro
knows what to expect you assign a folder to each well plate that you’ve previously designed in designExperiment()
. cypro
will then skim the folder and all its subfolders for data files that match the well-image names.
It takes the folder (here represented by ‘~/’) you’ve chosen and skims its content for files that end according to the following syntax:
Valid examples would be:
Invalid examples:
Note: Although the actual files are stored in subfolders choosing the folder TMZ_and_LY as the folder that contains all files for the respective well plate is the way to go as cypro
looks into the subfolders as well.
Once the folder is assigned cypro
evaluates by well how many of the expected files were found and indicates that by plotting the well plate colored according to the file availability (Figure 10).
For instance, the well plate designed here contained information for wells in rows A, D, E and H and columns 1-9. The input box ‘Images per Well’ was set to 3. cypro
therefore expects to find three files for every well named according to the syntax mentioned above. Depending on the number of matching files found one of the four colors appears:
Green: All files were found for that well.
Yellow: Some files were found for that well.
Red: No files were found for that well.
Blue: More than the expected number of files were found. This can happen if files are stored in subfolders and accidentally were named equally.
For instance files ~/subfolder1/A1_1.csv and ~/subfolder2/A1_1.csv would result in well A1 appearing blue.
Note: Figure 9 shows the content of folder ~/168/Ctrl/. It contains several files with the well-image pattern A4_2. However, only one of the file names is valid which is why the evaluation plot in Figure 10 does not indicate that there are any ambiguous files. On the other hand there are several files that end with the well image pattern A5_1. Although the files are prefixed differently cypro
does not recognize the difference as it considers all files that end with a well-image pattern as valid. The input for well image A5_1 is ambiguous. Ambiguous files need to be either removed or renamed.
In case of missing files the well plate plotted could look like this. (A well plate from a different experiment design.)
Missing and incomplete wells do not prevent you from loading the data. If you realize that some files are missing, check for possible naming issues, correct what needs to be corrected and then assign the folder again. The file availability is reevaluated and colors should change accordingly. Once all well plates have a folder assigned to them and no ambiguous files were found the box ‘Well Plate Status’ turns green indicating that the data is ready to be loaded.
After clicking on ‘Load Data’ a progress bar is going to appear in the lower right corner indicating the loading progress. While being loaded all files are checked to match the requirements set up during the variable denotation. If any errors occur during the loading process they are presented in the box ‘Load Files & Proceed’ in combination with the error message returned.
Again, you can choose to ignore failed files and simply proceed without them by clicking on ‘Save & Proceed’ and then on ‘Return Cypro Object’. If possible you can fix the reason that prevented cypro
from reading the file successfully and click on ‘Load Data’ again.
The way loadDataFile()
works is very similar to the way loadDataFiles()
works. There are two differences:
Instead of uploading a template file you immediately upload the file that contains all the data.
Before denoting the module specific variables you need to denote the ones that contain the information about the well plate, well and ROI.
These experiment design variables are then checked for validity. If they can be used to match the data to the experiment design you can proceed with denoting the module specific variables as described above and eventually return the cypro
-object that now contains the data.