The framework in which cypro
operates invites you to extract data from the cypro object to do your own analysis apart from what cypro
currently offers. In order to do that you can make use of a variety of get*()
-functions.
# load packages
library(cypro)
library(tidyverse)
# load object from broad institute compound profiling experiment week 4
# contains one time imaging data
object_one_time <- readRDS(file = "data/bids-week4.RDS")
# load object that contains tracking data
object_tracks <- readRDS(file = "data/example-tracks.RDS")
The data of your cells is stored in the @cdata
slot in form of five data.frames.
The tracks data.frame contains information for every cell at a given point of time This data.frame plays a significant role in time lapse experiments. Use getTracksDf()
to obtain it.
tracks_df <- getTracksDf(object_tracks)
# print first 10 rows
head(tracks_df)
## # A tibble: 6 x 15
## cell_id frame_num frame_time frame_itvl x_coords y_coords speed dfo dflp
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CID_1_WI~ 16 48 48 hours 220 216 1.94 5.83 5.83
## 2 CID_1_WI~ 17 51 51 hours 222 215 0.745 3.61 2.24
## 3 CID_1_WI~ 18 54 54 hours 230 209 3.33 6.4 10
## 4 CID_1_WI~ 19 57 57 hours 221 242 11.4 29.3 34.2
## 5 CID_1_WI~ 20 60 60 hours 221 212 10 4.12 30
## 6 CID_1_WI~ 21 63 63 hours 220 216 1.37 5.83 4.12
## # ... with 6 more variables: cell_line <fct>, condition <fct>,
## # well_plate_name <fct>, well_plate_index <fct>, well <fct>, well_image <fct>
The stats data.frame (statistics data.frame) contains summary information of every cell. In case of time lapse experiments it contains mean, median, maximum etc. summaries of all variables you specified to keep. Apart from the identifier variable cell_id all variables in this data.frame are numeric.
stat_df <- getStatsDf(object = object_tracks, with_grouping = FALSE)
head(stat_df, 100)
## # A tibble: 100 x 21
## cell_id max_speed mean_speed median_speed min_speed sd_speed var_speed
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CID_1_WI_A1_1~ 13.7 5.54 3.35 0.745 4.81 23.2
## 2 CID_1_WI_A1_2~ 15.3 7.77 8.83 1.33 4.80 23.0
## 3 CID_1_WI_A1_3~ 16.5 5.67 1.89 0.943 6.52 42.5
## 4 CID_1_WI_A2_2~ 5.75 1.68 1.20 0.333 1.63 2.66
## 5 CID_1_WI_A2_3~ 16.7 4.56 2.87 0.667 5.13 26.3
## 6 CID_1_WI_A3_1~ 12.8 5.53 3.90 0.943 4.55 20.7
## 7 CID_1_WI_A3_2~ 12.3 5.77 5.47 0.471 4.00 16.0
## 8 CID_1_WI_A4_1~ 10.6 4.11 4.35 0.333 3.21 10.3
## 9 CID_1_WI_A4_2~ 1 0.647 0.667 0.471 0.169 0.0285
## 10 CID_1_WI_A5_1~ 12.1 5.13 3.14 1 3.74 14.0
## # ... with 90 more rows, and 14 more variables: max_dfo <dbl>, mean_dfo <dbl>,
## # median_dfo <dbl>, min_dfo <dbl>, sd_dfo <dbl>, var_dfo <dbl>,
## # max_dflp <dbl>, mean_dflp <dbl>, median_dflp <dbl>, min_dflp <dbl>,
## # sd_dflp <dbl>, var_dflp <dbl>, total_dist <dbl>, mgr_eff <dbl>
This data.frame serves as the basic for most of the statistical tests, dimensional reduction and machine learning techniques you can use with cypro
. See the tutorial on variable sets for functions that help dealing with the vast number of variables this data.frame can contain.
cypro
distinguishes three types of grouping variables
Well plate variables well_plate_name, well and well_image specify the localisation of cells regarding the experiment design and might be used in case of quality control and removal of batch effects. The respective data.frame can be obtained via getWellPlateDf()
.
wp_df <- getWellPlateDf(object_tracks)
# print first 10 rows
head(wp_df, 10)
## # A tibble: 10 x 5
## cell_id well_plate_name well_plate_index well well_image
## <chr> <fct> <fct> <fct> <fct>
## 1 CID_1_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 2 CID_2_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 3 CID_3_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 4 CID_4_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 5 CID_5_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 6 CID_6_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 7 CID_7_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 8 CID_8_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 9 CID_9_WI_A4_1_WP_1 one WP_1 A4 A4_1
## 10 CID_10_WI_A4_1_WP_1 one WP_1 A4 A4_1
Meta variables are grouping variables such as condition and cell line. They will feature a major overlap with those from the the well plate data.frame. However, in case of timelapse experiments that include multiple phases they might differ from phase to phase which is why they are stored separately. Use getMetaDf()
to obtain the data.frame.
meta_df <- getMetaDf(object_tracks)
# print an example subset of every condition
dplyr::group_by(meta_df, condition) %>%
dplyr::slice_head(n = 3)
## # A tibble: 12 x 3
## # Groups: condition [4]
## cell_id cell_line condition
## <chr> <fct> <fct>
## 1 CID_1_WI_A4_1_WP_1 168 Ctrl
## 2 CID_2_WI_A4_1_WP_1 168 Ctrl
## 3 CID_3_WI_A4_1_WP_1 168 Ctrl
## 4 CID_1_WI_E4_1_WP_1 168 LY(1uM)
## 5 CID_2_WI_E4_1_WP_1 168 LY(1uM)
## 6 CID_3_WI_E4_1_WP_1 168 LY(1uM)
## 7 CID_1_WI_D4_1_WP_1 168 TMZ(100uM)
## 8 CID_2_WI_D4_1_WP_1 168 TMZ(100uM)
## 9 CID_3_WI_D4_1_WP_1 168 TMZ(100uM)
## 10 CID_1_WI_H4_1_WP_1 168 TMZ(100uM)+LY(1uM)
## 11 CID_2_WI_H4_1_WP_1 168 TMZ(100uM)+LY(1uM)
## 12 CID_3_WI_H4_1_WP_1 168 TMZ(100uM)+LY(1uM)
See the tutorial on how to add variables to the cypro object to learn how to expand this data.frame.
Cluster variables are the result of applied machine learning techniques such as kmeans- or hierarchical clustering. If none of these have been used the data.frame is going to be empty apart from the identifier variable cell_id. See the tutorial on how to cluster cells in cypro
. Obtain the data.frame via getClusterDf()
.
cluster_df <- getClusterDf(object_one_time)
# print first 10 rows
head(cluster_df, 100)
## # A tibble: 100 x 12
## cell_id `hcl_euclidean_c~ `hcl_euclidean_~ `hcl_euclidean_~ `hcl_euclidean_~
## <chr> <fct> <fct> <fct> <fct>
## 1 CID_1_W~ 1 1 1 1
## 2 CID_2_W~ 1 1 1 1
## 3 CID_7_W~ 1 1 1 1
## 4 CID_10_~ 1 1 1 1
## 5 CID_14_~ 1 1 1 1
## 6 CID_15_~ 1 1 1 1
## 7 CID_17_~ 1 1 1 1
## 8 CID_20_~ 1 1 1 1
## 9 CID_21_~ 1 1 1 1
## 10 CID_22_~ 1 1 1 1
## # ... with 90 more rows, and 7 more variables:
## # hcl_euclidean_ward.D_k_4_(intensity) <fct>,
## # hcl_euclidean_ward.D_k_5_(intensity) <fct>,
## # kmeans_Lloyd_k_3_(intensity) <fct>, kmeans_Lloyd_k_4_(intensity) <fct>,
## # kmeans_Lloyd_k_5_(intensity) <fct>, pam_euclidean_k_4_(intensity) <fct>,
## # pam_euclidean_k_5_(intensity) <fct>
All grouping variables together form the grouping data.frame. It can be extracted via getGroupingDf()
and contains all variables of the three grouping types.
group_df <- getGroupingDf(object_tracks)
# print first 10 rows
head(group_df)
## # A tibble: 6 x 7
## cell_id cell_line condition well_plate_name well_plate_index well well_image
## <chr> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 CID_1_W~ 168 Ctrl one WP_1 A4 A4_1
## 2 CID_2_W~ 168 Ctrl one WP_1 A4 A4_1
## 3 CID_3_W~ 168 Ctrl one WP_1 A4 A4_1
## 4 CID_4_W~ 168 Ctrl one WP_1 A4 A4_1
## 5 CID_5_W~ 168 Ctrl one WP_1 A4 A4_1
## 6 CID_6_W~ 168 Ctrl one WP_1 A4 A4_1
As mentioned here, the difference between track data and stat data becomes obsolete in one time imaging experiments. The data type prefix track and stat are interchangeable in every function if you use them with an object that contains data from one time imaging experiments.
stat_df <- getStatsDf(object_one_time, with_grouping = FALSE)
track_df <- getTracksDf(object_one_time, with_grouping = FALSE)
identical(x = stat_df, y = track_df)
## [1] TRUE
This applies to every other function referring to either track or stat data (e.g. getStatVariableNames()
, getTrackVariableNames()
).
The variable data.frame focuses on the variables and provides summary statistics. It is stored in the @vdata
slot. Use getVariableDf()
to obtain it.
variable_df <- getVariableDf(object_tracks)
# print first 10 rows
head(variable_df)
## # A tibble: 6 x 15
## variable vars n mean sd median trimmed mad min max range skew
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 max_spe~ 1 7987 7.80 4.55 7.61 7.70 5.98 0.333 18.6 18.3 0.115
## 2 mean_sp~ 2 7987 3.08 2.21 2.73 2.85 2.27 0.0741 13.8 13.7 0.913
## 3 median_~ 3 7987 2.46 2.32 1.80 2.06 1.96 0 14.6 14.6 1.56
## 4 min_spe~ 4 7987 0.598 0.787 0.333 0.458 0.494 0 10.4 10.4 3.65
## 5 sd_speed 5 7987 2.49 1.56 2.34 2.41 1.93 0 7.33 7.33 0.353
## 6 var_spe~ 6 7987 8.61 8.95 5.48 7.23 7.36 0 53.7 53.7 1.29
## # ... with 3 more variables: kurtosis <dbl>, se <dbl>, IQR <dbl>
Several parts of the experiment design, in particular, the well plate design is stored in data.frames as well. Use getSetUpDf()
to obtain it.
# obtain a vector of all well plate names
wp_names <- getWellPlateNames(object_tracks)
wp_names
## [1] "one"
set_up_df <- getSetUpDf(object_tracks, well_plate_name = wp_names[1])
# print first 10 rows
head(set_up_df)
## # A tibble: 6 x 16
## # Groups: well [6]
## row_num col_num row_letter well group information_sta~ cell_line condition
## <int> <int> <chr> <chr> <chr> <fct> <chr> <chr>
## 1 1 1 A A1 well_p~ Complete GSC 1.Ctrl ->~
## 2 1 2 A A2 well_p~ Complete GSC 1.Ctrl ->~
## 3 1 3 A A3 well_p~ Complete GSC 1.Ctrl ->~
## 4 1 4 A A4 well_p~ Complete 168 1.Ctrl ->~
## 5 1 5 A A5 well_p~ Complete 168 1.Ctrl ->~
## 6 1 6 A A6 well_p~ Complete 168 1.Ctrl ->~
## # ... with 8 more variables: cl_condition <chr>, type <chr>,
## # condition_df <list>, ipw <int>, well_files <dbl>, well_image_files <dbl>,
## # ambiguous <lgl>, availability_status <fct>
The majority of celltracers functions require you to specify some kind of data variables. As exemplified by the stats data.frame above the number of numeric variables can be vast. And since there are several ways to cluster your cells the number of grouping variables might grow in a similar fashion. The tutorial on variable sets explains how you can deal with this multitude of variable with respect to analytics. This chapter explains how you can deal with the multitude of names.
The get*VariableNames()
-family of functions allows to obtain character vectors of variable names which are the column names of the respective data.frames introduced in chapter 1. Cell data.frames.
stat_vars <- getStatVariableNames(object_tracks)
wp_vars <- getWellPlateVariableNames(object_tracks)
meta_vars <- getMetaVariableNames(object_tracks)
cluster_vars <- getClusterVariableNames(object_tracks)
stat_vars
output
## [1] "max_speed" "mean_speed" "median_speed" "min_speed" "sd_speed"
## [6] "var_speed" "max_dfo" "mean_dfo" "median_dfo" "min_dfo"
## [11] "sd_dfo" "var_dfo" "max_dflp" "mean_dflp" "median_dflp"
## [16] "min_dflp" "sd_dflp" "var_dflp" "total_dist" "mgr_eff"
You will rarely need all variable names of any kind at once. As exemplified in our tutorial on plotting descriptive statistics at some point you will probably need only shape related variables or only speed related variables. Or you might be only interested in cluster variables of the kmeans method, for instance. This is where the tidyselect-language come in very handy that are implemented in cypro
. It contains a variety of helper functions but the most important ones are:
(Check out a more detailed description on how they work by either following this link or running ?starts_with
in your R console.)
# obtain only variables that describe the cells area
area_names <- getStatVariableNames(object_one_time, starts_with("AreaShape"))
area_names
output
## [1] "AreaShape_Area" "AreaShape_BoundingBoxArea"
## [3] "AreaShape_BoundingBoxMaximum_X" "AreaShape_BoundingBoxMaximum_Y"
## [5] "AreaShape_BoundingBoxMinimum_X" "AreaShape_BoundingBoxMinimum_Y"
## [7] "AreaShape_Center_X" "AreaShape_Center_Y"
## [9] "AreaShape_Compactness" "AreaShape_Eccentricity"
## [11] "AreaShape_EquivalentDiameter" "AreaShape_EulerNumber"
## [13] "AreaShape_Extent" "AreaShape_FormFactor"
## [15] "AreaShape_MajorAxisLength" "AreaShape_MaxFeretDiameter"
## [17] "AreaShape_MaximumRadius" "AreaShape_MeanRadius"
## [19] "AreaShape_MedianRadius" "AreaShape_MinFeretDiameter"
## [21] "AreaShape_MinorAxisLength" "AreaShape_Orientation"
## [23] "AreaShape_Perimeter" "AreaShape_Solidity"
## [25] "AreaShape_Zernike_0_0" "AreaShape_Zernike_1_1"
## [27] "AreaShape_Zernike_2_0" "AreaShape_Zernike_2_2"
## [29] "AreaShape_Zernike_3_1" "AreaShape_Zernike_3_3"
## [31] "AreaShape_Zernike_4_0" "AreaShape_Zernike_4_2"
## [33] "AreaShape_Zernike_4_4" "AreaShape_Zernike_5_1"
## [35] "AreaShape_Zernike_5_3" "AreaShape_Zernike_5_5"
## [37] "AreaShape_Zernike_6_0" "AreaShape_Zernike_6_2"
## [39] "AreaShape_Zernike_6_4" "AreaShape_Zernike_6_6"
## [41] "AreaShape_Zernike_7_1" "AreaShape_Zernike_7_3"
## [43] "AreaShape_Zernike_7_5" "AreaShape_Zernike_7_7"
## [45] "AreaShape_Zernike_8_0" "AreaShape_Zernike_8_2"
## [47] "AreaShape_Zernike_8_4" "AreaShape_Zernike_8_6"
## [49] "AreaShape_Zernike_8_8" "AreaShape_Zernike_9_1"
## [51] "AreaShape_Zernike_9_3" "AreaShape_Zernike_9_5"
## [53] "AreaShape_Zernike_9_7" "AreaShape_Zernike_9_9"
Tidyselect helpers are easy to combine. With a bit of practice only little typing is necessary to obtain exactly the variable names you need for your analyis step of interest.
# only zernike features
zernike_vars <- getStatVariableNames(object_one_time, contains("Zernike"))
zernike_vars
output
## [1] "AreaShape_Zernike_0_0"
## [2] "AreaShape_Zernike_1_1"
## [3] "AreaShape_Zernike_2_0"
## [4] "AreaShape_Zernike_2_2"
## [5] "AreaShape_Zernike_3_1"
## [6] "AreaShape_Zernike_3_3"
## [7] "AreaShape_Zernike_4_0"
## [8] "AreaShape_Zernike_4_2"
## [9] "AreaShape_Zernike_4_4"
## [10] "AreaShape_Zernike_5_1"
## [11] "AreaShape_Zernike_5_3"
## [12] "AreaShape_Zernike_5_5"
## [13] "AreaShape_Zernike_6_0"
## [14] "AreaShape_Zernike_6_2"
## [15] "AreaShape_Zernike_6_4"
## [16] "AreaShape_Zernike_6_6"
## [17] "AreaShape_Zernike_7_1"
## [18] "AreaShape_Zernike_7_3"
## [19] "AreaShape_Zernike_7_5"
## [20] "AreaShape_Zernike_7_7"
## [21] "AreaShape_Zernike_8_0"
## [22] "AreaShape_Zernike_8_2"
## [23] "AreaShape_Zernike_8_4"
## [24] "AreaShape_Zernike_8_6"
## [25] "AreaShape_Zernike_8_8"
## [26] "AreaShape_Zernike_9_1"
## [27] "AreaShape_Zernike_9_3"
## [28] "AreaShape_Zernike_9_5"
## [29] "AreaShape_Zernike_9_7"
## [30] "AreaShape_Zernike_9_9"
## [31] "RadialDistribution_ZernikeMagnitude_Actin_0_0"
## [32] "RadialDistribution_ZernikeMagnitude_Actin_1_1"
## [33] "RadialDistribution_ZernikeMagnitude_Actin_2_0"
## [34] "RadialDistribution_ZernikeMagnitude_Actin_2_2"
## [35] "RadialDistribution_ZernikeMagnitude_Actin_3_1"
## [36] "RadialDistribution_ZernikeMagnitude_Actin_3_3"
## [37] "RadialDistribution_ZernikeMagnitude_Actin_4_0"
## [38] "RadialDistribution_ZernikeMagnitude_Actin_4_2"
## [39] "RadialDistribution_ZernikeMagnitude_Actin_4_4"
## [40] "RadialDistribution_ZernikeMagnitude_Actin_5_1"
## [41] "RadialDistribution_ZernikeMagnitude_Actin_5_3"
## [42] "RadialDistribution_ZernikeMagnitude_Actin_5_5"
## [43] "RadialDistribution_ZernikeMagnitude_Actin_6_0"
## [44] "RadialDistribution_ZernikeMagnitude_Actin_6_2"
## [45] "RadialDistribution_ZernikeMagnitude_Actin_6_4"
## [46] "RadialDistribution_ZernikeMagnitude_Actin_6_6"
## [47] "RadialDistribution_ZernikeMagnitude_Actin_7_1"
## [48] "RadialDistribution_ZernikeMagnitude_Actin_7_3"
## [49] "RadialDistribution_ZernikeMagnitude_Actin_7_5"
## [50] "RadialDistribution_ZernikeMagnitude_Actin_7_7"
## [51] "RadialDistribution_ZernikeMagnitude_Actin_8_0"
## [52] "RadialDistribution_ZernikeMagnitude_Actin_8_2"
## [53] "RadialDistribution_ZernikeMagnitude_Actin_8_4"
## [54] "RadialDistribution_ZernikeMagnitude_Actin_8_6"
## [55] "RadialDistribution_ZernikeMagnitude_Actin_8_8"
## [56] "RadialDistribution_ZernikeMagnitude_Actin_9_1"
## [57] "RadialDistribution_ZernikeMagnitude_Actin_9_3"
## [58] "RadialDistribution_ZernikeMagnitude_Actin_9_5"
## [59] "RadialDistribution_ZernikeMagnitude_Actin_9_7"
## [60] "RadialDistribution_ZernikeMagnitude_Actin_9_9"
## [61] "RadialDistribution_ZernikePhase_Actin_0_0"
## [62] "RadialDistribution_ZernikePhase_Actin_1_1"
## [63] "RadialDistribution_ZernikePhase_Actin_2_0"
## [64] "RadialDistribution_ZernikePhase_Actin_2_2"
## [65] "RadialDistribution_ZernikePhase_Actin_3_1"
## [66] "RadialDistribution_ZernikePhase_Actin_3_3"
## [67] "RadialDistribution_ZernikePhase_Actin_4_0"
## [68] "RadialDistribution_ZernikePhase_Actin_4_2"
## [69] "RadialDistribution_ZernikePhase_Actin_4_4"
## [70] "RadialDistribution_ZernikePhase_Actin_5_1"
## [71] "RadialDistribution_ZernikePhase_Actin_5_3"
## [72] "RadialDistribution_ZernikePhase_Actin_5_5"
## [73] "RadialDistribution_ZernikePhase_Actin_6_0"
## [74] "RadialDistribution_ZernikePhase_Actin_6_2"
## [75] "RadialDistribution_ZernikePhase_Actin_6_4"
## [76] "RadialDistribution_ZernikePhase_Actin_6_6"
## [77] "RadialDistribution_ZernikePhase_Actin_7_1"
## [78] "RadialDistribution_ZernikePhase_Actin_7_3"
## [79] "RadialDistribution_ZernikePhase_Actin_7_5"
## [80] "RadialDistribution_ZernikePhase_Actin_7_7"
## [81] "RadialDistribution_ZernikePhase_Actin_8_0"
## [82] "RadialDistribution_ZernikePhase_Actin_8_2"
## [83] "RadialDistribution_ZernikePhase_Actin_8_4"
## [84] "RadialDistribution_ZernikePhase_Actin_8_6"
## [85] "RadialDistribution_ZernikePhase_Actin_8_8"
## [86] "RadialDistribution_ZernikePhase_Actin_9_1"
## [87] "RadialDistribution_ZernikePhase_Actin_9_3"
## [88] "RadialDistribution_ZernikePhase_Actin_9_5"
## [89] "RadialDistribution_ZernikePhase_Actin_9_7"
## [90] "RadialDistribution_ZernikePhase_Actin_9_9"
# only zernike features that derived from shape analysis
area_zernike_vars <- getStatVariableNames(object_one_time, starts_with("AreaShape") & contains("Zernike"))
area_zernike_vars
output
## [1] "AreaShape_Zernike_0_0" "AreaShape_Zernike_1_1" "AreaShape_Zernike_2_0"
## [4] "AreaShape_Zernike_2_2" "AreaShape_Zernike_3_1" "AreaShape_Zernike_3_3"
## [7] "AreaShape_Zernike_4_0" "AreaShape_Zernike_4_2" "AreaShape_Zernike_4_4"
## [10] "AreaShape_Zernike_5_1" "AreaShape_Zernike_5_3" "AreaShape_Zernike_5_5"
## [13] "AreaShape_Zernike_6_0" "AreaShape_Zernike_6_2" "AreaShape_Zernike_6_4"
## [16] "AreaShape_Zernike_6_6" "AreaShape_Zernike_7_1" "AreaShape_Zernike_7_3"
## [19] "AreaShape_Zernike_7_5" "AreaShape_Zernike_7_7" "AreaShape_Zernike_8_0"
## [22] "AreaShape_Zernike_8_2" "AreaShape_Zernike_8_4" "AreaShape_Zernike_8_6"
## [25] "AreaShape_Zernike_8_8" "AreaShape_Zernike_9_1" "AreaShape_Zernike_9_3"
## [28] "AreaShape_Zernike_9_5" "AreaShape_Zernike_9_7" "AreaShape_Zernike_9_9"
# only cluster variables with k = 4
k_4_cluster <- getClusterVariableNames(object_one_time, contains("k_4"))
k_4_cluster
output
## [1] "hcl_euclidean_complete_k_4_(intensity)"
## [2] "hcl_euclidean_ward.D_k_4_(intensity)"
## [3] "kmeans_Lloyd_k_4_(intensity)"
## [4] "pam_euclidean_k_4_(intensity)"
# only cluster variables with k = 4 (without hierarchical algorithm)
k_4_cluster2 <- getClusterVariableNames(object_one_time, contains("k_4") & -starts_with("hcl"))
k_4_cluster2
output
## [1] "kmeans_Lloyd_k_4_(intensity)" "pam_euclidean_k_4_(intensity)"
Grouping variables contain information on how to group cells encoded in the group names. In several cases you might want to specify specific groups. To obtain the respective names make use of getGroupNames()
or its wrapper getConditions()
, getCellLines()
.
# two ways to obtain condition names
conditions_1 <- getGroupNames(object_one_time, grouping_variable = "condition")
conditions_2 <- getConditions(object_one_time)
conditions_1
## [1] "5-fluorouracil" "AG-1478"
## [3] "anisomycin" "AZ258"
## [5] "caspase inhibitor (ZVAD)" "cyclohexamide"
## [7] "DMSO" "indirubin monoxime"
## [9] "mitomycin C" "neomycin"
## [11] "olomoucine" "taxol"
## [13] "tunicamycin"
conditions_2
## [1] "5-fluorouracil" "AG-1478"
## [3] "anisomycin" "AZ258"
## [5] "caspase inhibitor (ZVAD)" "cyclohexamide"
## [7] "DMSO" "indirubin monoxime"
## [9] "mitomycin C" "neomycin"
## [11] "olomoucine" "taxol"
## [13] "tunicamycin"
The use of getGroupNames()
to obtain cluster names becomes useful once you have renamed clusters as they are initially encoded in numbers. The tidyselect helpers introduced in section 4. Variable names can be used within getGroupNames()
as well.
# all cluster names
hcl5_all <-
getGroupNames(
object = object_one_time,
grouping_variable = "hcl_euclidean_ward.D_k_5_(intensity)"
)
hcl5_all
## [1] "low" "low-medium" "medium" "medium-low" "high"
# medium cluster names
hcl5_medium <-
getGroupNames(
object = object_one_time,
grouping_variable = "hcl_euclidean_ward.D_k_5_(intensity)",
contains("medium")
)
hcl5_medium
## [1] "low-medium" "medium" "medium-low"