I'm having trouble getting r's dplyr::arrange() to sort properly when used in a for loop. I found many posts discussing this issue (like ex.1 with the .by_group=TRUE and using desc() bettter, ex.2 with lists, and ex.3 with filter_all() and %in%). Yet, I'm still having a bit of trouble understanding why I can get the arrange() to work when I use the column name directly but not when I refer to its index position within a vector, which will later be used in a loop to aid data extraction from a larger dataframe.
Here is a reproducible toy data to demonstrate:
set.seed(1)
toy <- data.frame(a=rep(sample(letters[1:5], 4, TRUE)), tf=sample(c("T","F"), 100, TRUE), n1=sample(1:100, 100, TRUE), n2=1:100)
get_it <- colnames(toy)[3:4]
My initial approach so far works with the indexed vector on the select() portion, but fails to sort on the arrange() even with the .by_group option. I also tried adding dplyr::arrange() but not change.
j=1 # pretending this is the 1st pass in the loop
toy %>%
select(a, tf, get_it[j]) %>%
group_by(a) %>%
arrange(desc(get_it[j]), .by_group=TRUE)
a tf n1
<chr> <chr> <int>
a T 21
a T 17
a F 87
a T 90
a T 64
example output truncated
However, I get the intended sorted results when I switch the indexed vector in the arrange() for the same name of the column (select still works fine):
j=1 # pretending this is the 1st pass through the loop
toy %>%
select(a, tf, get_it[j]) %>%
group_by(a) %>%
arrange(desc(n1), .by_group=TRUE)
a tf n1
<chr> <chr> <int>
a F 99
a F 98
a F 96
a F 95
a T 93
example output truncated
Why does the second version work, but not the first? What should I change so that I can loop this through many columns?
Thanks in advance! I appreciate your time!
(minor edit to correct a typo.)
CodePudding user response:
This is "programming with dplyr", use .data
for referencing columns by a string:
toy %>%
select(a, tf, get_it[j]) %>%
group_by(a) %>%
arrange(desc(.data[[ get_it[j] ]]), .by_group=TRUE)
# # A tibble: 100 x 3
# # Groups: a [3]
# a tf n1
# <chr> <chr> <int>
# 1 a F 99
# 2 a F 98
# 3 a F 96
# 4 a F 95
# 5 a T 93
# 6 a T 92
# 7 a T 92
# 8 a T 90
# 9 a F 87
# 10 a F 86
# # ... with 90 more rows