Postprocess de-analysis results — filterDeaDf • SPATA2

Processes the results of getDeaResultsDf(). See details.

filterDeaDf(
  dea_df,
  max_adj_pval = 0.05,
  min_lfc = 0,
  n_highest_lfc = 25,
  n_lowest_pval = 25,
  across_subset = NULL,
  relevel = FALSE,
  return = "data.frame"
)

Arguments

dea_df

A data.frame containing information about differentially expressed genes. Must contain the variables:

gene: Character. The differentially expressed genes.
cluster: Character. The clusters (or experimental groups) across which the analysis was performed.
avg_logFC: Numeric. The average log-fold change to which the belonging gene was differentially expressed..
p_val: Numeric. The p-values.
p_val_adj: Numeric. The adjusted p-values.

Hint: Use the resulting data.frame of SPATA::findDE() or it's descendants as input.

max_adj_pval

Numeric value. Sets the threshold for adjusted p-values. All genes with adjusted p-values above that threshold are ignored.

min_lfc

Numeric value. Sets the threshold for average log fold change. All genes with an average log fold change below that threshold are ignored.

n_highest_lfc

Numeric value. Affects the total number of genes that are kept. See details.

n_lowest_pval

Numeric value. Affects the total number of genes that are kept. See details.

across_subset

Character vector or NULL. Specifies the particular groups of interest the grouping variable specified in argument across contains.

If set to NULL all of them are chosen. You can prefix groups you are NOT interested in with a '-'. (Saves writing if there are more groups you are interested in than groups you are not interested in.)

Use getGroupNames() to obtain all valid input options.

relevel

Logical value. If set to TRUE the input order of across_subset determines the order in which the groups of interest are displayed. Groups that are not included are dropped which affects the colors with which they are displayed.

return

Character value. Denotes the output type. One of 'data.frame', 'vector' or 'list

Value

Depends on input of argument return:

return = 'data.frame': The filtered data.frame of dea_df with all it's variables.
return = 'vector': A named vector of all genes that remain. Named by the experimental group in which they were differently expressed.
return = 'list: A list named according to the experimental groups. Every slot of that list is a character vector containing the differently expressed genes of the respective experimental group.

Details

The de-data.frame is processed such that the following steps are performed for every experimental group.

Discards genes with avg_logFC-values that are either infinite or negative
Discards genes with adjusted p-values above the threshold set with max_adj_pval
Discard genes with average log fold change below the treshold set with min_lfc
Slices the data.frame in order that for every experimental group:
1. the n genes with the highest avg_logFC-values are kept where n = n_highest_lfc
2. the n genes with the lowest p_val_adj-values are kept where n = n_lowest_pval
Arranges the genes according to the highest avg_logFC-values