Extract Data • cypro

The framework in which cypro operates invites you to extract data from the cypro object to do your own analysis apart from what cypro currently offers. In order to do that you can make use of a variety of get*()-functions.

# load packages
library(cypro)
library(tidyverse)

# load object from broad institute compound profiling experiment week 4
# contains one time imaging data
object_one_time <- readRDS(file = "data/bids-week4.RDS")

# load object that contains tracking data 
object_tracks <- readRDS(file = "data/example-tracks.RDS")

1. Cell data.frames

The data of your cells is stored in the @cdata slot in form of five data.frames.

1.1 Track(ing) data

The tracks data.frame contains information for every cell at a given point of time This data.frame plays a significant role in time lapse experiments. Use getTracksDf() to obtain it.

tracks_df <- getTracksDf(object_tracks)

# print first 10 rows
head(tracks_df)

## # A tibble: 6 x 15
##   cell_id   frame_num frame_time frame_itvl x_coords y_coords  speed   dfo  dflp
##   <chr>         <dbl>      <dbl> <chr>         <dbl>    <dbl>  <dbl> <dbl> <dbl>
## 1 CID_1_WI~        16         48 48 hours        220      216  1.94   5.83  5.83
## 2 CID_1_WI~        17         51 51 hours        222      215  0.745  3.61  2.24
## 3 CID_1_WI~        18         54 54 hours        230      209  3.33   6.4  10   
## 4 CID_1_WI~        19         57 57 hours        221      242 11.4   29.3  34.2 
## 5 CID_1_WI~        20         60 60 hours        221      212 10      4.12 30   
## 6 CID_1_WI~        21         63 63 hours        220      216  1.37   5.83  4.12
## # ... with 6 more variables: cell_line <fct>, condition <fct>,
## #   well_plate_name <fct>, well_plate_index <fct>, well <fct>, well_image <fct>

1.2 Stat(istic) data

The stats data.frame (statistics data.frame) contains summary information of every cell. In case of time lapse experiments it contains mean, median, maximum etc. summaries of all variables you specified to keep. Apart from the identifier variable cell_id all variables in this data.frame are numeric.

stat_df <- getStatsDf(object = object_tracks, with_grouping = FALSE)

head(stat_df, 100)

## # A tibble: 100 x 21
##    cell_id        max_speed mean_speed median_speed min_speed sd_speed var_speed
##    <chr>              <dbl>      <dbl>        <dbl>     <dbl>    <dbl>     <dbl>
##  1 CID_1_WI_A1_1~     13.7       5.54         3.35      0.745    4.81    23.2   
##  2 CID_1_WI_A1_2~     15.3       7.77         8.83      1.33     4.80    23.0   
##  3 CID_1_WI_A1_3~     16.5       5.67         1.89      0.943    6.52    42.5   
##  4 CID_1_WI_A2_2~      5.75      1.68         1.20      0.333    1.63     2.66  
##  5 CID_1_WI_A2_3~     16.7       4.56         2.87      0.667    5.13    26.3   
##  6 CID_1_WI_A3_1~     12.8       5.53         3.90      0.943    4.55    20.7   
##  7 CID_1_WI_A3_2~     12.3       5.77         5.47      0.471    4.00    16.0   
##  8 CID_1_WI_A4_1~     10.6       4.11         4.35      0.333    3.21    10.3   
##  9 CID_1_WI_A4_2~      1         0.647        0.667     0.471    0.169    0.0285
## 10 CID_1_WI_A5_1~     12.1       5.13         3.14      1        3.74    14.0   
## # ... with 90 more rows, and 14 more variables: max_dfo <dbl>, mean_dfo <dbl>,
## #   median_dfo <dbl>, min_dfo <dbl>, sd_dfo <dbl>, var_dfo <dbl>,
## #   max_dflp <dbl>, mean_dflp <dbl>, median_dflp <dbl>, min_dflp <dbl>,
## #   sd_dflp <dbl>, var_dflp <dbl>, total_dist <dbl>, mgr_eff <dbl>

This data.frame serves as the basic for most of the statistical tests, dimensional reduction and machine learning techniques you can use with cypro. See the tutorial on variable sets for functions that help dealing with the vast number of variables this data.frame can contain.

1.3 Grouping data

cypro distinguishes three types of grouping variables

well plate variables
meta variables
cluster variables

1.3.1 Well Plate data.frame

Well plate variables well_plate_name, well and well_image specify the localisation of cells regarding the experiment design and might be used in case of quality control and removal of batch effects. The respective data.frame can be obtained via getWellPlateDf().

wp_df <- getWellPlateDf(object_tracks)

# print first 10 rows
head(wp_df, 10)

## # A tibble: 10 x 5
##    cell_id             well_plate_name well_plate_index well  well_image
##    <chr>               <fct>           <fct>            <fct> <fct>     
##  1 CID_1_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  2 CID_2_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  3 CID_3_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  4 CID_4_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  5 CID_5_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  6 CID_6_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  7 CID_7_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  8 CID_8_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
##  9 CID_9_WI_A4_1_WP_1  one             WP_1             A4    A4_1      
## 10 CID_10_WI_A4_1_WP_1 one             WP_1             A4    A4_1

1.3.2 Meta data.frame

Meta variables are grouping variables such as condition and cell line. They will feature a major overlap with those from the the well plate data.frame. However, in case of timelapse experiments that include multiple phases they might differ from phase to phase which is why they are stored separately. Use getMetaDf()to obtain the data.frame.

meta_df <- getMetaDf(object_tracks)

# print an example subset of every condition 
dplyr::group_by(meta_df, condition) %>% 
  dplyr::slice_head(n = 3)

## # A tibble: 12 x 3
## # Groups:   condition [4]
##    cell_id            cell_line condition         
##    <chr>              <fct>     <fct>             
##  1 CID_1_WI_A4_1_WP_1 168       Ctrl              
##  2 CID_2_WI_A4_1_WP_1 168       Ctrl              
##  3 CID_3_WI_A4_1_WP_1 168       Ctrl              
##  4 CID_1_WI_E4_1_WP_1 168       LY(1uM)           
##  5 CID_2_WI_E4_1_WP_1 168       LY(1uM)           
##  6 CID_3_WI_E4_1_WP_1 168       LY(1uM)           
##  7 CID_1_WI_D4_1_WP_1 168       TMZ(100uM)        
##  8 CID_2_WI_D4_1_WP_1 168       TMZ(100uM)        
##  9 CID_3_WI_D4_1_WP_1 168       TMZ(100uM)        
## 10 CID_1_WI_H4_1_WP_1 168       TMZ(100uM)+LY(1uM)
## 11 CID_2_WI_H4_1_WP_1 168       TMZ(100uM)+LY(1uM)
## 12 CID_3_WI_H4_1_WP_1 168       TMZ(100uM)+LY(1uM)

See the tutorial on how to add variables to the cypro object to learn how to expand this data.frame.

1.3.3 Cluster data.frame

Cluster variables are the result of applied machine learning techniques such as kmeans- or hierarchical clustering. If none of these have been used the data.frame is going to be empty apart from the identifier variable cell_id. See the tutorial on how to cluster cells in cypro. Obtain the data.frame via getClusterDf().

cluster_df <- getClusterDf(object_one_time)

# print first 10 rows
head(cluster_df, 100)

## # A tibble: 100 x 12
##    cell_id  `hcl_euclidean_c~ `hcl_euclidean_~ `hcl_euclidean_~ `hcl_euclidean_~
##    <chr>    <fct>             <fct>            <fct>            <fct>           
##  1 CID_1_W~ 1                 1                1                1               
##  2 CID_2_W~ 1                 1                1                1               
##  3 CID_7_W~ 1                 1                1                1               
##  4 CID_10_~ 1                 1                1                1               
##  5 CID_14_~ 1                 1                1                1               
##  6 CID_15_~ 1                 1                1                1               
##  7 CID_17_~ 1                 1                1                1               
##  8 CID_20_~ 1                 1                1                1               
##  9 CID_21_~ 1                 1                1                1               
## 10 CID_22_~ 1                 1                1                1               
## # ... with 90 more rows, and 7 more variables:
## #   hcl_euclidean_ward.D_k_4_(intensity) <fct>,
## #   hcl_euclidean_ward.D_k_5_(intensity) <fct>,
## #   kmeans_Lloyd_k_3_(intensity) <fct>, kmeans_Lloyd_k_4_(intensity) <fct>,
## #   kmeans_Lloyd_k_5_(intensity) <fct>, pam_euclidean_k_4_(intensity) <fct>,
## #   pam_euclidean_k_5_(intensity) <fct>

1.3.4 Grouping data.frame

All grouping variables together form the grouping data.frame. It can be extracted via getGroupingDf() and contains all variables of the three grouping types.

group_df <- getGroupingDf(object_tracks)

# print first 10 rows
head(group_df)

## # A tibble: 6 x 7
##   cell_id  cell_line condition well_plate_name well_plate_index well  well_image
##   <chr>    <fct>     <fct>     <fct>           <fct>            <fct> <fct>     
## 1 CID_1_W~ 168       Ctrl      one             WP_1             A4    A4_1      
## 2 CID_2_W~ 168       Ctrl      one             WP_1             A4    A4_1      
## 3 CID_3_W~ 168       Ctrl      one             WP_1             A4    A4_1      
## 4 CID_4_W~ 168       Ctrl      one             WP_1             A4    A4_1      
## 5 CID_5_W~ 168       Ctrl      one             WP_1             A4    A4_1      
## 6 CID_6_W~ 168       Ctrl      one             WP_1             A4    A4_1

1.4 Track data and Stat data (One time imaging)

As mentioned here, the difference between track data and stat data becomes obsolete in one time imaging experiments. The data type prefix track and stat are interchangeable in every function if you use them with an object that contains data from one time imaging experiments.

stat_df <- getStatsDf(object_one_time, with_grouping = FALSE)

track_df <- getTracksDf(object_one_time, with_grouping = FALSE)

identical(x = stat_df, y = track_df)

## [1] TRUE

This applies to every other function referring to either track or stat data (e.g. getStatVariableNames(), getTrackVariableNames()).

2. Variable data.frame

The variable data.frame focuses on the variables and provides summary statistics. It is stored in the @vdataslot. Use getVariableDf() to obtain it.

variable_df <- getVariableDf(object_tracks)

# print first 10 rows
head(variable_df)

## # A tibble: 6 x 15
##   variable  vars     n  mean    sd median trimmed   mad    min   max range  skew
##   <chr>    <int> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 max_spe~     1  7987 7.80  4.55   7.61    7.70  5.98  0.333  18.6  18.3  0.115
## 2 mean_sp~     2  7987 3.08  2.21   2.73    2.85  2.27  0.0741 13.8  13.7  0.913
## 3 median_~     3  7987 2.46  2.32   1.80    2.06  1.96  0      14.6  14.6  1.56 
## 4 min_spe~     4  7987 0.598 0.787  0.333   0.458 0.494 0      10.4  10.4  3.65 
## 5 sd_speed     5  7987 2.49  1.56   2.34    2.41  1.93  0       7.33  7.33 0.353
## 6 var_spe~     6  7987 8.61  8.95   5.48    7.23  7.36  0      53.7  53.7  1.29 
## # ... with 3 more variables: kurtosis <dbl>, se <dbl>, IQR <dbl>

3. Set Up data.frame

Several parts of the experiment design, in particular, the well plate design is stored in data.frames as well. Use getSetUpDf() to obtain it.

# obtain a vector of all well plate names 
wp_names <- getWellPlateNames(object_tracks)

wp_names

## [1] "one"

set_up_df <- getSetUpDf(object_tracks, well_plate_name = wp_names[1])

# print first 10 rows
head(set_up_df)

## # A tibble: 6 x 16
## # Groups:   well [6]
##   row_num col_num row_letter well  group   information_sta~ cell_line condition 
##     <int>   <int> <chr>      <chr> <chr>   <fct>            <chr>     <chr>     
## 1       1       1 A          A1    well_p~ Complete         GSC       1.Ctrl ->~
## 2       1       2 A          A2    well_p~ Complete         GSC       1.Ctrl ->~
## 3       1       3 A          A3    well_p~ Complete         GSC       1.Ctrl ->~
## 4       1       4 A          A4    well_p~ Complete         168       1.Ctrl ->~
## 5       1       5 A          A5    well_p~ Complete         168       1.Ctrl ->~
## 6       1       6 A          A6    well_p~ Complete         168       1.Ctrl ->~
## # ... with 8 more variables: cl_condition <chr>, type <chr>,
## #   condition_df <list>, ipw <int>, well_files <dbl>, well_image_files <dbl>,
## #   ambiguous <lgl>, availability_status <fct>

4. Variable names

The majority of celltracers functions require you to specify some kind of data variables. As exemplified by the stats data.frame above the number of numeric variables can be vast. And since there are several ways to cluster your cells the number of grouping variables might grow in a similar fashion. The tutorial on variable sets explains how you can deal with this multitude of variable with respect to analytics. This chapter explains how you can deal with the multitude of names.

4.1 Functions to extract names

The get*VariableNames()-family of functions allows to obtain character vectors of variable names which are the column names of the respective data.frames introduced in chapter 1. Cell data.frames.

stat_vars <- getStatVariableNames(object_tracks)

wp_vars <- getWellPlateVariableNames(object_tracks)

meta_vars <- getMetaVariableNames(object_tracks)

cluster_vars <- getClusterVariableNames(object_tracks)

stat_vars

output

##  [1] "max_speed"    "mean_speed"   "median_speed" "min_speed"    "sd_speed"    
##  [6] "var_speed"    "max_dfo"      "mean_dfo"     "median_dfo"   "min_dfo"     
## [11] "sd_dfo"       "var_dfo"      "max_dflp"     "mean_dflp"    "median_dflp" 
## [16] "min_dflp"     "sd_dflp"      "var_dflp"     "total_dist"   "mgr_eff"

4.2 Functions to select the right names

You will rarely need all variable names of any kind at once. As exemplified in our tutorial on plotting descriptive statistics at some point you will probably need only shape related variables or only speed related variables. Or you might be only interested in cluster variables of the kmeans method, for instance. This is where the tidyselect-language come in very handy that are implemented in cypro. It contains a variety of helper functions but the most important ones are:

(Check out a more detailed description on how they work by either following this link or running ?starts_with in your R console.)

# obtain only variables that describe the cells area  
area_names <- getStatVariableNames(object_one_time, starts_with("AreaShape"))

area_names

output

##  [1] "AreaShape_Area"                 "AreaShape_BoundingBoxArea"     
##  [3] "AreaShape_BoundingBoxMaximum_X" "AreaShape_BoundingBoxMaximum_Y"
##  [5] "AreaShape_BoundingBoxMinimum_X" "AreaShape_BoundingBoxMinimum_Y"
##  [7] "AreaShape_Center_X"             "AreaShape_Center_Y"            
##  [9] "AreaShape_Compactness"          "AreaShape_Eccentricity"        
## [11] "AreaShape_EquivalentDiameter"   "AreaShape_EulerNumber"         
## [13] "AreaShape_Extent"               "AreaShape_FormFactor"          
## [15] "AreaShape_MajorAxisLength"      "AreaShape_MaxFeretDiameter"    
## [17] "AreaShape_MaximumRadius"        "AreaShape_MeanRadius"          
## [19] "AreaShape_MedianRadius"         "AreaShape_MinFeretDiameter"    
## [21] "AreaShape_MinorAxisLength"      "AreaShape_Orientation"         
## [23] "AreaShape_Perimeter"            "AreaShape_Solidity"            
## [25] "AreaShape_Zernike_0_0"          "AreaShape_Zernike_1_1"         
## [27] "AreaShape_Zernike_2_0"          "AreaShape_Zernike_2_2"         
## [29] "AreaShape_Zernike_3_1"          "AreaShape_Zernike_3_3"         
## [31] "AreaShape_Zernike_4_0"          "AreaShape_Zernike_4_2"         
## [33] "AreaShape_Zernike_4_4"          "AreaShape_Zernike_5_1"         
## [35] "AreaShape_Zernike_5_3"          "AreaShape_Zernike_5_5"         
## [37] "AreaShape_Zernike_6_0"          "AreaShape_Zernike_6_2"         
## [39] "AreaShape_Zernike_6_4"          "AreaShape_Zernike_6_6"         
## [41] "AreaShape_Zernike_7_1"          "AreaShape_Zernike_7_3"         
## [43] "AreaShape_Zernike_7_5"          "AreaShape_Zernike_7_7"         
## [45] "AreaShape_Zernike_8_0"          "AreaShape_Zernike_8_2"         
## [47] "AreaShape_Zernike_8_4"          "AreaShape_Zernike_8_6"         
## [49] "AreaShape_Zernike_8_8"          "AreaShape_Zernike_9_1"         
## [51] "AreaShape_Zernike_9_3"          "AreaShape_Zernike_9_5"         
## [53] "AreaShape_Zernike_9_7"          "AreaShape_Zernike_9_9"

Tidyselect helpers are easy to combine. With a bit of practice only little typing is necessary to obtain exactly the variable names you need for your analyis step of interest.

4.3 Examples

# only zernike features 
zernike_vars <- getStatVariableNames(object_one_time, contains("Zernike"))

zernike_vars

output

##  [1] "AreaShape_Zernike_0_0"                        
##  [2] "AreaShape_Zernike_1_1"                        
##  [3] "AreaShape_Zernike_2_0"                        
##  [4] "AreaShape_Zernike_2_2"                        
##  [5] "AreaShape_Zernike_3_1"                        
##  [6] "AreaShape_Zernike_3_3"                        
##  [7] "AreaShape_Zernike_4_0"                        
##  [8] "AreaShape_Zernike_4_2"                        
##  [9] "AreaShape_Zernike_4_4"                        
## [10] "AreaShape_Zernike_5_1"                        
## [11] "AreaShape_Zernike_5_3"                        
## [12] "AreaShape_Zernike_5_5"                        
## [13] "AreaShape_Zernike_6_0"                        
## [14] "AreaShape_Zernike_6_2"                        
## [15] "AreaShape_Zernike_6_4"                        
## [16] "AreaShape_Zernike_6_6"                        
## [17] "AreaShape_Zernike_7_1"                        
## [18] "AreaShape_Zernike_7_3"                        
## [19] "AreaShape_Zernike_7_5"                        
## [20] "AreaShape_Zernike_7_7"                        
## [21] "AreaShape_Zernike_8_0"                        
## [22] "AreaShape_Zernike_8_2"                        
## [23] "AreaShape_Zernike_8_4"                        
## [24] "AreaShape_Zernike_8_6"                        
## [25] "AreaShape_Zernike_8_8"                        
## [26] "AreaShape_Zernike_9_1"                        
## [27] "AreaShape_Zernike_9_3"                        
## [28] "AreaShape_Zernike_9_5"                        
## [29] "AreaShape_Zernike_9_7"                        
## [30] "AreaShape_Zernike_9_9"                        
## [31] "RadialDistribution_ZernikeMagnitude_Actin_0_0"
## [32] "RadialDistribution_ZernikeMagnitude_Actin_1_1"
## [33] "RadialDistribution_ZernikeMagnitude_Actin_2_0"
## [34] "RadialDistribution_ZernikeMagnitude_Actin_2_2"
## [35] "RadialDistribution_ZernikeMagnitude_Actin_3_1"
## [36] "RadialDistribution_ZernikeMagnitude_Actin_3_3"
## [37] "RadialDistribution_ZernikeMagnitude_Actin_4_0"
## [38] "RadialDistribution_ZernikeMagnitude_Actin_4_2"
## [39] "RadialDistribution_ZernikeMagnitude_Actin_4_4"
## [40] "RadialDistribution_ZernikeMagnitude_Actin_5_1"
## [41] "RadialDistribution_ZernikeMagnitude_Actin_5_3"
## [42] "RadialDistribution_ZernikeMagnitude_Actin_5_5"
## [43] "RadialDistribution_ZernikeMagnitude_Actin_6_0"
## [44] "RadialDistribution_ZernikeMagnitude_Actin_6_2"
## [45] "RadialDistribution_ZernikeMagnitude_Actin_6_4"
## [46] "RadialDistribution_ZernikeMagnitude_Actin_6_6"
## [47] "RadialDistribution_ZernikeMagnitude_Actin_7_1"
## [48] "RadialDistribution_ZernikeMagnitude_Actin_7_3"
## [49] "RadialDistribution_ZernikeMagnitude_Actin_7_5"
## [50] "RadialDistribution_ZernikeMagnitude_Actin_7_7"
## [51] "RadialDistribution_ZernikeMagnitude_Actin_8_0"
## [52] "RadialDistribution_ZernikeMagnitude_Actin_8_2"
## [53] "RadialDistribution_ZernikeMagnitude_Actin_8_4"
## [54] "RadialDistribution_ZernikeMagnitude_Actin_8_6"
## [55] "RadialDistribution_ZernikeMagnitude_Actin_8_8"
## [56] "RadialDistribution_ZernikeMagnitude_Actin_9_1"
## [57] "RadialDistribution_ZernikeMagnitude_Actin_9_3"
## [58] "RadialDistribution_ZernikeMagnitude_Actin_9_5"
## [59] "RadialDistribution_ZernikeMagnitude_Actin_9_7"
## [60] "RadialDistribution_ZernikeMagnitude_Actin_9_9"
## [61] "RadialDistribution_ZernikePhase_Actin_0_0"    
## [62] "RadialDistribution_ZernikePhase_Actin_1_1"    
## [63] "RadialDistribution_ZernikePhase_Actin_2_0"    
## [64] "RadialDistribution_ZernikePhase_Actin_2_2"    
## [65] "RadialDistribution_ZernikePhase_Actin_3_1"    
## [66] "RadialDistribution_ZernikePhase_Actin_3_3"    
## [67] "RadialDistribution_ZernikePhase_Actin_4_0"    
## [68] "RadialDistribution_ZernikePhase_Actin_4_2"    
## [69] "RadialDistribution_ZernikePhase_Actin_4_4"    
## [70] "RadialDistribution_ZernikePhase_Actin_5_1"    
## [71] "RadialDistribution_ZernikePhase_Actin_5_3"    
## [72] "RadialDistribution_ZernikePhase_Actin_5_5"    
## [73] "RadialDistribution_ZernikePhase_Actin_6_0"    
## [74] "RadialDistribution_ZernikePhase_Actin_6_2"    
## [75] "RadialDistribution_ZernikePhase_Actin_6_4"    
## [76] "RadialDistribution_ZernikePhase_Actin_6_6"    
## [77] "RadialDistribution_ZernikePhase_Actin_7_1"    
## [78] "RadialDistribution_ZernikePhase_Actin_7_3"    
## [79] "RadialDistribution_ZernikePhase_Actin_7_5"    
## [80] "RadialDistribution_ZernikePhase_Actin_7_7"    
## [81] "RadialDistribution_ZernikePhase_Actin_8_0"    
## [82] "RadialDistribution_ZernikePhase_Actin_8_2"    
## [83] "RadialDistribution_ZernikePhase_Actin_8_4"    
## [84] "RadialDistribution_ZernikePhase_Actin_8_6"    
## [85] "RadialDistribution_ZernikePhase_Actin_8_8"    
## [86] "RadialDistribution_ZernikePhase_Actin_9_1"    
## [87] "RadialDistribution_ZernikePhase_Actin_9_3"    
## [88] "RadialDistribution_ZernikePhase_Actin_9_5"    
## [89] "RadialDistribution_ZernikePhase_Actin_9_7"    
## [90] "RadialDistribution_ZernikePhase_Actin_9_9"

# only zernike features that derived from shape analysis
area_zernike_vars <- getStatVariableNames(object_one_time, starts_with("AreaShape") & contains("Zernike")) 

area_zernike_vars

output

##  [1] "AreaShape_Zernike_0_0" "AreaShape_Zernike_1_1" "AreaShape_Zernike_2_0"
##  [4] "AreaShape_Zernike_2_2" "AreaShape_Zernike_3_1" "AreaShape_Zernike_3_3"
##  [7] "AreaShape_Zernike_4_0" "AreaShape_Zernike_4_2" "AreaShape_Zernike_4_4"
## [10] "AreaShape_Zernike_5_1" "AreaShape_Zernike_5_3" "AreaShape_Zernike_5_5"
## [13] "AreaShape_Zernike_6_0" "AreaShape_Zernike_6_2" "AreaShape_Zernike_6_4"
## [16] "AreaShape_Zernike_6_6" "AreaShape_Zernike_7_1" "AreaShape_Zernike_7_3"
## [19] "AreaShape_Zernike_7_5" "AreaShape_Zernike_7_7" "AreaShape_Zernike_8_0"
## [22] "AreaShape_Zernike_8_2" "AreaShape_Zernike_8_4" "AreaShape_Zernike_8_6"
## [25] "AreaShape_Zernike_8_8" "AreaShape_Zernike_9_1" "AreaShape_Zernike_9_3"
## [28] "AreaShape_Zernike_9_5" "AreaShape_Zernike_9_7" "AreaShape_Zernike_9_9"

# only cluster variables with k = 4
k_4_cluster <- getClusterVariableNames(object_one_time, contains("k_4"))

k_4_cluster

output

## [1] "hcl_euclidean_complete_k_4_(intensity)"
## [2] "hcl_euclidean_ward.D_k_4_(intensity)"  
## [3] "kmeans_Lloyd_k_4_(intensity)"          
## [4] "pam_euclidean_k_4_(intensity)"

# only cluster variables with k = 4 (without hierarchical algorithm)
k_4_cluster2 <- getClusterVariableNames(object_one_time, contains("k_4") & -starts_with("hcl"))

k_4_cluster2

output

## [1] "kmeans_Lloyd_k_4_(intensity)"  "pam_euclidean_k_4_(intensity)"

5. Group names

Grouping variables contain information on how to group cells encoded in the group names. In several cases you might want to specify specific groups. To obtain the respective names make use of getGroupNames() or its wrapper getConditions(), getCellLines().

# two ways to obtain condition names
conditions_1 <- getGroupNames(object_one_time, grouping_variable = "condition")

conditions_2 <- getConditions(object_one_time)

conditions_1

##  [1] "5-fluorouracil"           "AG-1478"                 
##  [3] "anisomycin"               "AZ258"                   
##  [5] "caspase inhibitor (ZVAD)" "cyclohexamide"           
##  [7] "DMSO"                     "indirubin monoxime"      
##  [9] "mitomycin C"              "neomycin"                
## [11] "olomoucine"               "taxol"                   
## [13] "tunicamycin"

conditions_2

##  [1] "5-fluorouracil"           "AG-1478"                 
##  [3] "anisomycin"               "AZ258"                   
##  [5] "caspase inhibitor (ZVAD)" "cyclohexamide"           
##  [7] "DMSO"                     "indirubin monoxime"      
##  [9] "mitomycin C"              "neomycin"                
## [11] "olomoucine"               "taxol"                   
## [13] "tunicamycin"

The use of getGroupNames() to obtain cluster names becomes useful once you have renamed clusters as they are initially encoded in numbers. The tidyselect helpers introduced in section 4. Variable names can be used within getGroupNames()as well.

# all cluster names
hcl5_all <-
  getGroupNames(
    object = object_one_time,
    grouping_variable = "hcl_euclidean_ward.D_k_5_(intensity)"
    )

hcl5_all

## [1] "low"        "low-medium" "medium"     "medium-low" "high"

# medium cluster names
hcl5_medium <-
  getGroupNames(
    object = object_one_time,
    grouping_variable = "hcl_euclidean_ward.D_k_5_(intensity)",
    contains("medium")
    )

hcl5_medium

## [1] "low-medium" "medium"     "medium-low"