Processes the results of findDE(). See details.

filterDE(
  de_df,
  n_highest_FC = 100,
  n_lowest_pvalue = 100,
  across_subset = NULL,
  return = "data.frame"
)

Arguments

de_df

A data.frame containing information about differentially expressed genes. Must contain the variables:

gene

Character. The differentially expressed genes.

cluster

Character. The clusters (or experimental groups) across which the analysis was performed.

avg_logFC

Numeric. The average log-fold change to which the belonging gene was differentially expressed..

p_val

Numeric. The p-values.

p_val_adj

Numeric. The adjusted p-values.

Hint: Use the resulting data.frame of SPATA::findDE() or it's descendants as input.

n_highest_FC

Numeric value. Affects the number of genes that are kept. See details.

n_lowest_pvalue

Numeric value. Affects the number of genes that are kept. See details.

across_subset

Character vector or NULL. Specify the particular groups or clusters of interest the feature-variable specified in argument across contains. If set to NULL all of them are chosen.

Hint: Use getFeatureValues() to obtain all available groups of a certain feature-variable.

return

Character value. Denotes the output type. One of 'data.frame', 'vector' or 'list

Value

Depends on input of arguemnt return:

  • return = 'data.frame': The filtered data.frame of de_df with all it's variables.

  • return = 'vector': A named vector of all genes that remain. Named by the experimental group in which they were differentially expressed.

  • return = 'list: A list named according to the experimental groups. Every slot of that list is a character vector containing the differentially expressed genes of the respective experimental group.

Details

filterDE() processes the input by grouping the data.frame according to the unique values of the cluster-variable such that the following steps are performed for every experimental group. (With "genes" we refer to the rows (observations) of data.)

  1. Discards genes with avg_logFC-values that are either infinite or negative

  2. Slices the data.frame in order that for every unique cluster of the cluster-variable:

    1. the n genes with the highest avg_logFC-values are kept where n = n_highest_FC

    2. the n genes with the lowest p_val_adj-values are kept where n = n_lowest_pvalue

  3. Arranges the genes according to the highest avg_logFC-values