Filter features from differential analysis based on statistical significance

Outputs dataset of significant genes from DEG table; filters genes based on statistical significance (p-value or adjusted p-value) and change (fold change, log2 fold change, or t-statistic); in addition allows for selection of DEG estimates and for sub-setting of contrasts and groups included in the output gene list.

Usage

filter_diff(
  moo,
  feature_id_colname = NULL,
  significance_column = "adjpval",
  significance_cutoff = 0.05,
  change_column = "logFC",
  change_cutoff = 1,
  filtering_mode = "any",
  include_estimates = c("FC", "logFC", "tstat", "pval", "adjpval"),
  round_estimates = TRUE,
  rounding_decimal_for_percent_cells = 0,
  contrast_filter = "none",
  contrasts = c(),
  groups = c(),
  groups_filter = "none",
  label_font_size = 6,
  label_distance = 1,
  y_axis_expansion = 0.08,
  fill_colors = c("steelblue1", "whitesmoke"),
  pie_chart_in_3d = TRUE,
  bar_width = 0.4,
  draw_bar_border = TRUE,
  plot_type = "bar",
  plot_titles_fontsize = 12,
  print_plots = options::opt("print_plots"),
  save_plots = options::opt("save_plots"),
  plots_subdir = file.path("diff", "filt")
)

Arguments

moo: multiOmicDataSet object (see create_multiOmicDataSet_from_dataframes())
feature_id_colname: The column from the counts data containing the Feature IDs (Usually Gene or Protein ID). This is usually the first column of your input Counts Matrix. Only columns of Text type from your input Counts Matrix will be available to select for this parameter. (Default: NULL - first column in the counts matrix will be used.)
significance_column: Column name for significance, e.g. "pval" or "pvaladj" (default)
significance_cutoff: Features will only be kept if their significance_column is less then this cutoff threshold
change_column: Column name for change, e.g. "logFC" (default)
change_cutoff: Features will only be kept if the absolute value of their change_column is greater than or equal to this cutoff threshold
filtering_mode: Accepted values: "any" or "all" to include features that meet the criteria in any contrast or in all contrasts
include_estimates: Column names of estimates to include. Default: c("FC", "logFC", "tstat", "pval", "adjpval")
round_estimates: Whether to round estimates. Default: TRUE
rounding_decimal_for_percent_cells: Decimal place to use when rounding Percent cells
contrast_filter: Whether to filter contrasts in or our of analysis. If "keep", only the contrast names listed in contrasts will be included. If "remove, the contrast names listed by contrasts will be removed. If "none", all contrasts in the dataset are used. Options: "keep", "remove", or "none"
contrasts: Contrast names to filter by contrast_filter. If contrast_filter is "none", this parameter has no effect.
groups: Group names to filter by groups_filter. If groups_filter is "none", this parameter has no effect. Options: "keep", "remove", or "none"
groups_filter: Whether to filter groups in or out of analysis. If "keep", only the group names listed in groups will be included. If "remove", the group names listed by groups will be removed. If "none", all groups in the dataset are used.
label_font_size: Font size for labels in the plot (default: 6)
label_distance: Distance of labels from the bars (default: 1)
y_axis_expansion: Expansion of the y-axis (default: 0.08)
fill_colors: Fill colors for the bars (default: c("steelblue1", "whitesmoke"))
pie_chart_in_3d: Whether to draw pie charts in 3D (default: TRUE)
bar_width: Width of the bars (default: 0.4)
draw_bar_border: Whether to draw borders around bars (default: TRUE)
plot_type: "bar" or "pie"
plot_titles_fontsize: Font size for plot titles (default: 12)
print_plots: Whether to print plots during analysis (Defaults to FALSE, overwritable using option 'moo_print_plots' or environment variable 'MOO_PRINT_PLOTS')
save_plots: Whether to save plots to files during analysis (Defaults to FALSE, overwritable using option 'moo_save_plots' or environment variable 'MOO_SAVE_PLOTS')
plots_subdir: subdirectory in where plots will be saved if save_plots is TRUE

Examples

moo <- multiOmicDataSet(
  sample_metadata = as.data.frame(nidap_sample_metadata),
  anno_dat = data.frame(),
  counts_lst = list(
    "raw" = as.data.frame(nidap_raw_counts),
    "clean" = as.data.frame(nidap_clean_raw_counts),
    "filt" = as.data.frame(nidap_filtered_counts)
  )
) %>%
  diff_counts(
    count_type = "filt",
    sub_count_type = NULL,
    sample_id_colname = "Sample",
    feature_id_colname = "Gene",
    covariates_colnames = c("Group", "Batch"),
    contrast_colname = c("Group"),
    contrasts = c("B-A", "C-A", "B-C"),
    voom_normalization_method = "quantile",
  ) %>%
  filter_diff()
#> * differential counts
#> Setting first column of `counts` as gene annotation.
#> Total number of genes included: 7943
#> Joining with `by = join_by(Gene)`
#> Joining with `by = join_by(Gene)`
#> * filtering differential features
#> Total number of genes selected with adjpval < 0.05 and | logFC | ≥ 1 is sum(selgenes)
head(moo@analyses$diff_filt)
#>            Gene B-A_FC B-A_logFC B-A_tstat B-A_pval B-A_adjpval C-A_FC
#> 1 1110034G24Rik  21.70     4.440      3.20  0.00782       0.210  36.60
#> 2 3110082I17Rik  -1.73    -0.789     -1.35  0.20300       0.710 -21.90
#> 3 4632428N05Rik   2.43     1.280      2.76  0.01770       0.303   4.66
#> 4 4833439L19Rik  -1.38    -0.460     -1.18  0.26000       0.758  -3.59
#> 5 4930523C07Rik  -2.30    -1.200     -1.62  0.13300       0.617   4.50
#> 6 5430427O19Rik  -2.22    -1.150     -2.46  0.03070       0.377  -4.49
#>   C-A_logFC C-A_tstat C-A_pval C-A_adjpval B-C_FC B-C_logFC B-C_tstat B-C_pval
#> 1      5.20      4.15 0.001410     0.02700  -1.69    -0.758    -0.838 0.419000
#> 2     -4.46     -3.80 0.002650     0.03830  12.70     3.670     2.930 0.012900
#> 3      2.22      5.25 0.000222     0.00929  -1.92    -0.941    -3.150 0.008590
#> 4     -1.84     -3.76 0.002810     0.03950   2.61     1.380     2.630 0.022200
#> 5      2.17      4.50 0.000767     0.01910 -10.30    -3.370    -5.040 0.000311
#> 6     -2.17     -3.68 0.003270     0.04320   2.02     1.010     1.530 0.153000
#>   B-C_adjpval
#> 1      0.7070
#> 2      0.1440
#> 3      0.1240
#> 4      0.1860
#> 5      0.0224
#> 6      0.4420

Filter features from differential analysis based on statistical significance

Usage

Arguments

See also

Examples