Postprocess de-results — filterDE • SPATA

Processes the results of findDE(). See details.

filterDE(
  de_df,
  n_highest_FC = 100,
  n_lowest_pvalue = 100,
  across_subset = NULL,
  return = "data.frame"
)

Arguments

de_df	A data.frame containing information about differentially expressed genes. Must contain the variables: gene Character. The differentially expressed genes. cluster Character. The clusters (or experimental groups) across which the analysis was performed. avg_logFC Numeric. The average log-fold change to which the belonging gene was differentially expressed.. p_val Numeric. The p-values. p_val_adj Numeric. The adjusted p-values. Hint: Use the resulting data.frame of `SPATA::findDE()` or it's descendants as input.
n_highest_FC	Numeric value. Affects the number of genes that are kept. See details.
n_lowest_pvalue	Numeric value. Affects the number of genes that are kept. See details.
across_subset	Character vector or NULL. Specify the particular groups or clusters of interest the feature-variable specified in argument `across` contains. If set to NULL all of them are chosen. Hint: Use `getFeatureValues()` to obtain all available groups of a certain feature-variable.
return	Character value. Denotes the output type. One of 'data.frame', 'vector' or 'list

Value

Depends on input of arguemnt return:

return = 'data.frame': The filtered data.frame of de_df with all it's variables.
return = 'vector': A named vector of all genes that remain. Named by the experimental group in which they were differentially expressed.
return = 'list: A list named according to the experimental groups. Every slot of that list is a character vector containing the differentially expressed genes of the respective experimental group.

Details

filterDE() processes the input by grouping the data.frame according to the unique values of the cluster-variable such that the following steps are performed for every experimental group. (With "genes" we refer to the rows (observations) of data.)

Discards genes with avg_logFC-values that are either infinite or negative
Slices the data.frame in order that for every unique cluster of the cluster-variable:
1. the n genes with the highest avg_logFC-values are kept where n = n_highest_FC
2. the n genes with the lowest p_val_adj-values are kept where n = n_lowest_pvalue
Arranges the genes according to the highest avg_logFC-values