Automated fluorescence microscopy has enabled researchers to study biological systems and cellular behavior on a large scale. Elaborate algorithms for image analysis implemented in open source software like CellProfiler or CellTracker allow to quantify a variety of cellular aspects obtained by the imaging process. Handling the multitude of data files generated this way, however, poses a substantial obstacle to researchers.
The R package cypro
provides a toolkit of R-functions and interactive applications to enable and facilitate the analysis of single cell high throughput imaging results. clickable interfaces allow to easily read in hundreds of data files across a multitude of experiment designs. Moreover, cypro
implements a variety of statistical and machine learning tools in convenient functions to profile and classify cells deriving from all kinds of imaging set ups. The tutorials you’ll find on this website will guide you through everything cypro
provides.
cypro
is compatible with the output of all state of the art image processing softwares such as CellProfiler, CellTracker, ImageJ.
Throughout all functions and applications cypro
implements the tidy-data-concept and draws it’s main power from the tidyverse. With respect to the integration of individual analysis concepts this means in particular that all data.frames in cypro
are oriented towards the tidy-data structure in which each row represents an observation and every column represents a variable (information) of that observation. A consistent output and a consistent terminology facilitates the application of individual ideas via powerful tidyverse- and tidymodelling functions.
There are two kinds of output tables exported by image processing software mainly depending on the experiment design with which the underlying images have been generated.
In timelapse experiments cells are repeatedly imaged over a defined time span. Data tables that quantify the measurements of every image contain observations/rows that represent a cell at a given point of time with all its measured features. The track files of the CellTracker-software exemplify that.
## # A tibble: 200 x 9
## `Cell ID` `Frame number` `y-coordinate [p~ `x-coordinate [~ `Distance from o~
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 784 98 NA
## 2 1 2 792 80 19.7
## 3 1 3 791 78 21.2
## 4 1 4 795 75 25.5
## 5 1 5 801 77 27.0
## 6 1 6 797 80 22.2
## 7 1 7 799 81 22.7
## 8 1 8 796 83 19.2
## 9 1 9 800 86 20
## 10 1 10 806 92 22.8
## # ... with 190 more rows, and 4 more variables:
## # Distance from last point [micrometer] <dbl>,
## # Instantaneous speed [micrometer/hour] <dbl>,
## # Angle from origin [degree] <dbl>, Angle from last point [degree] <dbl>
The cell is identified by the Cell ID-variable and the point of time is identified by the Frame number-variable. As long as your data tables contain columns that represent these two variables you can analyze them with cypro
. Every additional variable such as coordinates, distances, or - in case of CellProfiler outputs - shape and intensity related features provide optional additional information.
The cypro
-package has been initially developed for analysis of time lapse imaging data. However, in some experiment set ups cells are only imaged one time. We refer to this experiment design as one-time-imaging. Although time dependent data such as cellular migration or mitosis can not be obtained by such experiments image processing with CellProfiler still allows to quantify characteristics such as shape-, intensity- and granularity related features of cells under different conditions.
The Human MCF7 cells-compound-profiling-experiment serves as an example for that. In this case again the data files observations refer to cells at a given point of time. As there is, however, only one point of time the variable referring to the point of time is negligible. The table below is a subset of an output file derived from the CellProfiler pipeline that has been published together with the data set you find using the link above.
## # A tibble: 164 x 25
## ObjectNumber Area BoundingBoxArea BoundingBoxMaximum_X BoundingBoxMaximum_Y
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1611 2806 414 46
## 2 2 2628 4585 593 35
## 3 3 2817 4368 337 52
## 4 4 2279 4560 183 48
## 5 5 3122 4272 89 48
## 6 6 4667 6554 892 58
## 7 7 4372 6669 1256 57
## 8 8 2348 4736 191 76
## 9 9 3208 5865 125 83
## 10 10 5970 9516 295 78
## # ... with 154 more rows, and 20 more variables: BoundingBoxMinimum_X <dbl>,
## # BoundingBoxMinimum_Y <dbl>, Center_X <dbl>, Center_Y <dbl>,
## # Compactness <dbl>, Eccentricity <dbl>, EquivalentDiameter <dbl>,
## # EulerNumber <dbl>, Extent <dbl>, FormFactor <dbl>, MajorAxisLength <dbl>,
## # MaxFeretDiameter <dbl>, MaximumRadius <dbl>, MeanRadius <dbl>,
## # MedianRadius <dbl>, MinFeretDiameter <dbl>, MinorAxisLength <dbl>,
## # Orientation <dbl>, Perimeter <dbl>, Solidity <dbl>
Here, the variable referring to the cell ID is called ObjectNumber. As mentioned above, many built in analysis modules for aspects like migration and mitosis can not be used with one-time-imaging experiments due to the lack of time lapse data. However, you can still use the built in clustering, correlation and dimensional reduction pipelines of cypro
as well as its statistical plotting functions.
You can use cypro
for downstream analysis as long as your date files contain the mentioned identifier variables (Cell ID and Frame / Point of time), irrespective of the software of origin or the way columns are named.