Don’t forget to use stack().
I have recently found a couple of great use cases for the stack()
function from {utils}
.
Because I want to remind my future self about this, I thought it would make a good short post to test this {distill}
site that I just created!
From the stack() function documentation:
“Stacking vectors concatenates multiple vectors into a single vector along with a factor indicating where each observation originated. Unstacking reverses this operation.”
Sometimes, I get a bunch of vectors. Maybe I had multiple files or outputs with various items in them that correspond to different groups. Often, I need to combine these and then check how many of the items exist across multiple groups.
For the purpose of illustration, here I will pretend that I read into R a set of gene names as a named list.
stack()
makes a nice tidy data.frame! (Note that this would also work if the input was a nested list of lists.)
stack(my_list)
values ind
1 KRAS test1
2 EGFR test1
3 ERBB2 test1
4 ERBB2 test2
5 ERBB3 test2
6 SPRY2 test2
7 AR test2
8 APC test3
9 BRAF test3
If you table()
the result from stack()
, now you have a nice matrix of the values in each group.
ind
values test1 test2 test3
APC 0 0 1
AR 0 1 0
BRAF 0 0 1
EGFR 1 0 0
ERBB2 1 1 0
ERBB3 0 1 0
KRAS 1 0 0
SPRY2 0 1 0
The resulting object is a table. You can convert it to a data.frame.
as.data.frame.array(table(stack(my_list)))
test1 test2 test3
APC 0 0 1
AR 0 1 0
BRAF 0 0 1
EGFR 1 0 0
ERBB2 1 1 0
ERBB3 0 1 0
KRAS 1 0 0
SPRY2 0 1 0
You can also convert the binary matrix to logical (TRUE/FALSE).
ind
values test1 test2 test3
APC FALSE FALSE TRUE
AR FALSE TRUE FALSE
BRAF FALSE FALSE TRUE
EGFR TRUE FALSE FALSE
ERBB2 TRUE TRUE FALSE
ERBB3 FALSE TRUE FALSE
KRAS TRUE FALSE FALSE
SPRY2 FALSE TRUE FALSE
Now, imagine a case where you have the table and some values are greater than 1 (because they appeared in a list more than once). You can use a trick to convert to logical and back to numeric 0/1.
my_list_w_repeats <- list(
test1 = c("KRAS","EGFR","ERBB2"),
test2 = c("ERBB2","ERBB3","SPRY2","AR"),
test3 = c("APC","APC","APC","BRAF")) # APC is here 3 times
table(stack(my_list_w_repeats))
ind
values test1 test2 test3
APC 0 0 3
AR 0 1 0
BRAF 0 0 1
EGFR 1 0 0
ERBB2 1 1 0
ERBB3 0 1 0
KRAS 1 0 0
SPRY2 0 1 0
ind
values test1 test2 test3
APC 0 0 1
AR 0 1 0
BRAF 0 0 1
EGFR 1 0 0
ERBB2 1 1 0
ERBB3 0 1 0
KRAS 1 0 0
SPRY2 0 1 0
I forget about this function every once in a while and it is really useful. I also have a gist about this.
For fun, here is one way to do this with {dplyr} and {tidyr}. I would like to hear about other ways because I don’t find this as intuitive.
library(dplyr, quietly = TRUE)
lapply(my_list, function(x) data.frame(genes = x)) %>%
bind_rows(.id = "names")
names genes
1 test1 KRAS
2 test1 EGFR
3 test1 ERBB2
4 test2 ERBB2
5 test2 ERBB3
6 test2 SPRY2
7 test2 AR
8 test3 APC
9 test3 BRAF
Now to make the binary matrix.
lapply(my_list, function(x) data.frame(genes = x)) %>%
bind_rows(.id = "names") %>%
count(names, genes) %>%
tidyr::pivot_wider(names_from = "names",
values_from = "n",
values_fill = 0)
# A tibble: 8 x 4
genes test1 test2 test3
<chr> <int> <int> <int>
1 EGFR 1 0 0
2 ERBB2 1 1 0
3 KRAS 1 0 0
4 AR 0 1 0
5 ERBB3 0 1 0
6 SPRY2 0 1 0
7 APC 0 0 1
8 BRAF 0 0 1
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] dplyr_1.0.5
loaded via a namespace (and not attached):
[1] rstudioapi_0.13 knitr_1.37 magrittr_2.0.1
[4] tidyselect_1.1.0 downlit_0.4.0 R6_2.5.0
[7] rlang_0.4.10 fastmap_1.1.0 fansi_0.4.2
[10] stringr_1.4.0 tools_4.0.5 xfun_0.30
[13] utf8_1.2.1 cli_3.1.0 DBI_1.1.1
[16] jquerylib_0.1.4 htmltools_0.5.1.1 ellipsis_0.3.1
[19] assertthat_0.2.1 yaml_2.2.1 digest_0.6.29
[22] tibble_3.1.0 lifecycle_1.0.0 crayon_1.4.1
[25] tidyr_1.1.3 purrr_0.3.4 sass_0.4.0
[28] vctrs_0.3.7 distill_1.3 memoise_2.0.0
[31] glue_1.4.2 cachem_1.0.5 evaluate_0.14
[34] rmarkdown_2.11 stringi_1.5.3 compiler_4.0.5
[37] bslib_0.2.5.1 pillar_1.6.0 generics_0.1.0
[40] jsonlite_1.7.2 pkgconfig_2.0.3
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Alice (2022, March 13). Alice Walsh: Stacking vectors. Retrieved from https://awalsh17.github.io/posts/2022-03-13-stacking-in-base-r/
BibTeX citation
@misc{alice2022stacking, author = {Alice, }, title = {Alice Walsh: Stacking vectors}, url = {https://awalsh17.github.io/posts/2022-03-13-stacking-in-base-r/}, year = {2022} }