Automated fluorescence microscopy has enabled researchers to study biological systems and cellular behavior on a large scale. Elaborate algorithms for image analysis implemented in open source software like CellProfiler or CellTracker allow to quantify a variety of cellular aspects obtained by the imaging process. Handling the multitude of data files generated this way, however, poses a substantial obstacle to researchers.

The R package cypro provides a toolkit of R-functions and interactive applications to enable and facilitate the analysis of single cell high throughput imaging results. clickable interfaces allow to easily read in hundreds of data files across a multitude of experiment designs. Moreover, cypro implements a variety of statistical and machine learning tools in convenient functions to profile and classify cells deriving from all kinds of imaging set ups. The tutorials you’ll find on this website will guide you through everything cypro provides.

cypro is compatible with the output of all state of the art image processing softwares such as CellProfiler, CellTracker, ImageJ.

1. Concept

Throughout all functions and applications cypro implements the tidy-data-concept and draws it’s main power from the tidyverse. With respect to the integration of individual analysis concepts this means in particular that all data.frames in cypro are oriented towards the tidy-data structure in which each row represents an observation and every column represents a variable (information) of that observation. A consistent output and a consistent terminology facilitates the application of individual ideas via powerful tidyverse- and tidymodelling functions.

2. Data Input

There are two kinds of output tables exported by image processing software mainly depending on the experiment design with which the underlying images have been generated.

2.1 Timelapse Experiments

In timelapse experiments cells are repeatedly imaged over a defined time span. Data tables that quantify the measurements of every image contain observations/rows that represent a cell at a given point of time with all its measured features. The track files of the CellTracker-software exemplify that.

## # A tibble: 200 x 9
##    `Cell ID` `Frame number` `y-coordinate [p~ `x-coordinate [~ `Distance from o~
##        <dbl>          <dbl>             <dbl>            <dbl>             <dbl>
##  1         1              1               784               98              NA  
##  2         1              2               792               80              19.7
##  3         1              3               791               78              21.2
##  4         1              4               795               75              25.5
##  5         1              5               801               77              27.0
##  6         1              6               797               80              22.2
##  7         1              7               799               81              22.7
##  8         1              8               796               83              19.2
##  9         1              9               800               86              20  
## 10         1             10               806               92              22.8
## # ... with 190 more rows, and 4 more variables:
## #   Distance from last point [micrometer] <dbl>,
## #   Instantaneous speed [micrometer/hour] <dbl>,
## #   Angle from origin [degree] <dbl>, Angle from last point [degree] <dbl>

The cell is identified by the Cell ID-variable and the point of time is identified by the Frame number-variable. As long as your data tables contain columns that represent these two variables you can analyze them with cypro. Every additional variable such as coordinates, distances, or - in case of CellProfiler outputs - shape and intensity related features provide optional additional information.

2.2 One Time Imaging

The cypro-package has been initially developed for analysis of time lapse imaging data. However, in some experiment set ups cells are only imaged one time. We refer to this experiment design as one-time-imaging. Although time dependent data such as cellular migration or mitosis can not be obtained by such experiments image processing with CellProfiler still allows to quantify characteristics such as shape-, intensity- and granularity related features of cells under different conditions.

The Human MCF7 cells-compound-profiling-experiment serves as an example for that. In this case again the data files observations refer to cells at a given point of time. As there is, however, only one point of time the variable referring to the point of time is negligible. The table below is a subset of an output file derived from the CellProfiler pipeline that has been published together with the data set you find using the link above.

## # A tibble: 164 x 25
##    ObjectNumber  Area BoundingBoxArea BoundingBoxMaximum_X BoundingBoxMaximum_Y
##           <dbl> <dbl>           <dbl>                <dbl>                <dbl>
##  1            1  1611            2806                  414                   46
##  2            2  2628            4585                  593                   35
##  3            3  2817            4368                  337                   52
##  4            4  2279            4560                  183                   48
##  5            5  3122            4272                   89                   48
##  6            6  4667            6554                  892                   58
##  7            7  4372            6669                 1256                   57
##  8            8  2348            4736                  191                   76
##  9            9  3208            5865                  125                   83
## 10           10  5970            9516                  295                   78
## # ... with 154 more rows, and 20 more variables: BoundingBoxMinimum_X <dbl>,
## #   BoundingBoxMinimum_Y <dbl>, Center_X <dbl>, Center_Y <dbl>,
## #   Compactness <dbl>, Eccentricity <dbl>, EquivalentDiameter <dbl>,
## #   EulerNumber <dbl>, Extent <dbl>, FormFactor <dbl>, MajorAxisLength <dbl>,
## #   MaxFeretDiameter <dbl>, MaximumRadius <dbl>, MeanRadius <dbl>,
## #   MedianRadius <dbl>, MinFeretDiameter <dbl>, MinorAxisLength <dbl>,
## #   Orientation <dbl>, Perimeter <dbl>, Solidity <dbl>

Here, the variable referring to the cell ID is called ObjectNumber. As mentioned above, many built in analysis modules for aspects like migration and mitosis can not be used with one-time-imaging experiments due to the lack of time lapse data. However, you can still use the built in clustering, correlation and dimensional reduction pipelines of cypro as well as its statistical plotting functions.

3. Conclusion

You can use cypro for downstream analysis as long as your date files contain the mentioned identifier variables (Cell ID and Frame / Point of time), irrespective of the software of origin or the way columns are named.