I have a dataframe including a column of factors that I would like to subset to select every nth row, after grouping by factor level. For example,
my_df <- data.frame(col1 = c(1:12), col2 = rep(c("A","B", "C"), 4))
my_df
col1 col2
1 1 A
2 2 B
3 3 C
4 4 A
5 5 B
6 6 C
7 7 A
8 8 B
9 9 C
10 10 A
11 11 B
12 12 C
Subsetting to select every 2nd row should yield my_new_df as,
col1 col2
1 4 A
2 10 A
3 5 B
4 11 B
5 6 C
6 12 C
I tried in dplyr:
my_df %>% group_by(col2) %>%
my_df[seq(2, nrow(my_df), 2), ] -> my_new_df
I get an error:
Error: Can't subset columns that don't exist.
x Locations 4, 6, 8, 10, and 12 don't exist.
ℹ There are only 2 columns.
To see if the nrow function was a problem, I tried using the number directly. So,
my_df %>% group_by(col2) %>%
my_df[seq(2, 4, 2), ] -> my_new_df
Also gave an error,
Error: Can't subset columns that don't exist.
x Location 4 doesn't exist.
ℹ There are only 2 columns.
Run `rlang::last_error()` to see where the error occurred.
My expectation was that it would run the subsetting on each group of data and then combine them into 'my_new_df'. My understanding of how group_by works is clearly wrong but I am stuck on how to move past this error. Any help would much appreciated.
CodePudding user response:
Try:
my_df %>%
group_by(col2)%>%
slice(seq(from = 2, to = n(), by = 2))
# A tibble: 6 x 2
# Groups: col2 [3]
col1 col2
<int> <chr>
1 4 A
2 10 A
3 5 B
4 11 B
5 6 C
6 12 C
You might want to ungroup after slicing if you want to do other operations not based on col2.
CodePudding user response:
Here is a data.table
option:
library(data.table)
data <- as.data.table(my_df)
data[(rowid(col2) %% 2) == 0]
col1 col2
1: 4 A
2: 5 B
3: 6 C
4: 10 A
5: 11 B
6: 12 C
Or base R:
my_df[as.logical(with(my_df, ave(col1, col2, FUN = function(x)
seq_along(x) %% 2 == 0))), ]
col1 col2
4 4 A
5 5 B
6 6 C
10 10 A
11 11 B
12 12 C