I have a large amount of annual data in a data frame that will only get larger. I would like to organize it, grouping columns according to the year, which is included in the column names.
Base data:
dput(dat)
structure(list(id = 1:2, quantity = 3:4, avg_2002 = 5:6, avg_2003 = 7:8,
avg_2020 = 9:10, rev_2002 = c(15L, 24L), rev_2003 = c(21L,
32L), rev_2020 = c(27L, 40L)), row.names = c(NA, -2L), class = "data.frame")
What I would like to do is have all of the columns with, say, "2002" in them organized together, followed by the "2003" columns and so on...I know that relocate
in dplyr
is a good way to do it so I did the following:
dat <- tibble(dat)
dat <- dat %>%
relocate(grep("2002$", colnames(dat), value = TRUE),
.before = grep("2003$", colnames(dat), value = TRUE)) %>%
relocate(grep("2003$", colnames(dat), value = TRUE),
.after = grep("2002$", colnames(dat), value = TRUE))
which produces the desired result for my toy dataset:
id quantity avg_2002 rev_2002 avg_2003 rev_2003 avg_2020 rev_2020
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 3 5 15 7 21 9 27
2 2 4 6 24 8 32 10 40
My question is this:
- How do I generalize the code above so that I don't have to keep adding
relocate
statements ad nauseum? - Is there a better way to do this task without using
dplyr::relocate
?
Any suggestions are much appreciated. Thanks!
CodePudding user response:
We may use select
- extract the numeric part of the column names, order
it and use that index in select
to reorder
library(dplyr)
dat %>%
select(id, quantity, order(readr::parse_number(names(.)[-(1:2)])) 2)
-output
# A tibble: 2 × 8
id quantity avg_2002 rev_2002 avg_2003 rev_2003 avg_2020 rev_2020
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 3 5 15 7 21 9 27
2 2 4 6 24 8 32 10 40