Stacking vectors

Don’t forget to use stack().

Alice true
2022-03-13

Intro

I have recently found a couple of great use cases for the stack() function from {utils}.

Because I want to remind my future self about this, I thought it would make a good short post to test this {distill} site that I just created!

Documentation

From the stack() function documentation:

“Stacking vectors concatenates multiple vectors into a single vector along with a factor indicating where each observation originated. Unstacking reverses this operation.”

An example

Sometimes, I get a bunch of vectors. Maybe I had multiple files or outputs with various items in them that correspond to different groups. Often, I need to combine these and then check how many of the items exist across multiple groups.

For the purpose of illustration, here I will pretend that I read into R a set of gene names as a named list.

my_list <- list(test1 = c("KRAS","EGFR","ERBB2"),
                test2 = c("ERBB2","ERBB3","SPRY2","AR"),
                test3 = c("APC","BRAF"))

stack() makes a nice tidy data.frame! (Note that this would also work if the input was a nested list of lists.)

stack(my_list)
  values   ind
1   KRAS test1
2   EGFR test1
3  ERBB2 test1
4  ERBB2 test2
5  ERBB3 test2
6  SPRY2 test2
7     AR test2
8    APC test3
9   BRAF test3

If you table() the result from stack(), now you have a nice matrix of the values in each group.

table(stack(my_list))
       ind
values  test1 test2 test3
  APC       0     0     1
  AR        0     1     0
  BRAF      0     0     1
  EGFR      1     0     0
  ERBB2     1     1     0
  ERBB3     0     1     0
  KRAS      1     0     0
  SPRY2     0     1     0

The resulting object is a table. You can convert it to a data.frame.

      test1 test2 test3
APC       0     0     1
AR        0     1     0
BRAF      0     0     1
EGFR      1     0     0
ERBB2     1     1     0
ERBB3     0     1     0
KRAS      1     0     0
SPRY2     0     1     0

You can also convert the binary matrix to logical (TRUE/FALSE).

table(stack(my_list)) > 0
       ind
values  test1 test2 test3
  APC   FALSE FALSE  TRUE
  AR    FALSE  TRUE FALSE
  BRAF  FALSE FALSE  TRUE
  EGFR   TRUE FALSE FALSE
  ERBB2  TRUE  TRUE FALSE
  ERBB3 FALSE  TRUE FALSE
  KRAS   TRUE FALSE FALSE
  SPRY2 FALSE  TRUE FALSE

Now, imagine a case where you have the table and some values are greater than 1 (because they appeared in a list more than once). You can use a trick to convert to logical and back to numeric 0/1.

my_list_w_repeats <- list(
  test1 = c("KRAS","EGFR","ERBB2"),
  test2 = c("ERBB2","ERBB3","SPRY2","AR"),
  test3 = c("APC","APC","APC","BRAF")) # APC is here 3 times

table(stack(my_list_w_repeats))
       ind
values  test1 test2 test3
  APC       0     0     3
  AR        0     1     0
  BRAF      0     0     1
  EGFR      1     0     0
  ERBB2     1     1     0
  ERBB3     0     1     0
  KRAS      1     0     0
  SPRY2     0     1     0
+(table(stack(my_list_w_repeats)) > 0)
       ind
values  test1 test2 test3
  APC       0     0     1
  AR        0     1     0
  BRAF      0     0     1
  EGFR      1     0     0
  ERBB2     1     1     0
  ERBB3     0     1     0
  KRAS      1     0     0
  SPRY2     0     1     0

Summary

I forget about this function every once in a while and it is really useful. I also have a gist about this.

For fun, here is one way to do this with {dplyr} and {tidyr}. I would like to hear about other ways because I don’t find this as intuitive.

library(dplyr, quietly = TRUE)

lapply(my_list, function(x) data.frame(genes = x)) %>% 
  bind_rows(.id = "names")
  names genes
1 test1  KRAS
2 test1  EGFR
3 test1 ERBB2
4 test2 ERBB2
5 test2 ERBB3
6 test2 SPRY2
7 test2    AR
8 test3   APC
9 test3  BRAF

Now to make the binary matrix.

lapply(my_list, function(x) data.frame(genes = x)) %>% 
  bind_rows(.id = "names") %>%
  count(names, genes) %>%
  tidyr::pivot_wider(names_from = "names",
                     values_from = "n",
                     values_fill = 0)
# A tibble: 8 x 4
  genes test1 test2 test3
  <chr> <int> <int> <int>
1 EGFR      1     0     0
2 ERBB2     1     1     0
3 KRAS      1     0     0
4 AR        0     1     0
5 ERBB3     0     1     0
6 SPRY2     0     1     0
7 APC       0     0     1
8 BRAF      0     0     1

sessionInfo

R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] dplyr_1.0.5

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13   knitr_1.37        magrittr_2.0.1   
 [4] tidyselect_1.1.0  downlit_0.4.0     R6_2.5.0         
 [7] rlang_0.4.10      fastmap_1.1.0     fansi_0.4.2      
[10] stringr_1.4.0     tools_4.0.5       xfun_0.30        
[13] utf8_1.2.1        cli_3.1.0         DBI_1.1.1        
[16] jquerylib_0.1.4   htmltools_0.5.1.1 ellipsis_0.3.1   
[19] assertthat_0.2.1  yaml_2.2.1        digest_0.6.29    
[22] tibble_3.1.0      lifecycle_1.0.0   crayon_1.4.1     
[25] tidyr_1.1.3       purrr_0.3.4       sass_0.4.0       
[28] vctrs_0.3.7       distill_1.3       memoise_2.0.0    
[31] glue_1.4.2        cachem_1.0.5      evaluate_0.14    
[34] rmarkdown_2.11    stringi_1.5.3     compiler_4.0.5   
[37] bslib_0.2.5.1     pillar_1.6.0      generics_0.1.0   
[40] jsonlite_1.7.2    pkgconfig_2.0.3  

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Alice (2022, March 13). Alice Walsh: Stacking vectors. Retrieved from https://awalsh17.github.io/posts/2022-03-13-stacking-in-base-r/

BibTeX citation

@misc{alice2022stacking,
  author = {Alice, },
  title = {Alice Walsh: Stacking vectors},
  url = {https://awalsh17.github.io/posts/2022-03-13-stacking-in-base-r/},
  year = {2022}
}