I would like to extract the first observations of different groups from a dataframe. The number of observations comes from another dataframe.
Here is an example: In dataframe "x" there are different groups of animals with their names.
x <- data.frame(animals = c("Mice", "Mice", "Cats", "Cats", "Dogs", "Dogs", "Birds", "Birds"),
names = c("Jerry", "Speedy", "Tom", "Garfield", "Snoopy", "Pluto", "Tweety", "Donald"))
In dataframe "y", on the other hand, the group of animals is also shown, plus the number of observations I would like to have for the respective group.
y <- data.frame(animals = c("Mice", "Cats", "Dogs", "Birds"),
firstObs = c(1, 2, 1, 2))
For the mice I would like to have the first observation, for the cats the first two (here all), for the dogs again only the first observation and for the birds finally again the first two. This result should be stored in data frame "z".
z <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds", "Birds"),
names = c("Jerry", "Tom", "Garfield", "Snoopy", "Tweety", "Donald"))
I had thought of using dplyr, grouping by "animals" and then using the head() function. But unfortunately I don't know how to pass different values for head(n=) according to the groups....
I had then thought of duplicating the rows in "y" and applying a left-join.
y.2 <- data.frame(animals = c("Mice", "Cats", "Cats", "Dogs", "Birds", "Birds"))
y.2 %>% left_join(x, by = "animals")
However, then not the first or first two observations are joined but y.2 is extended.
I would therefore like to ask you if you have a solution to my problem. I would really appreciate it and thank you very much in advance!
CodePudding user response:
Create logical condition with row_number()
on the 'first_Obs' after joining the two datasets
library(dplyr)
left_join(x, y) %>%
group_by(animals) %>%
filter(row_number() <= first(firstObs)) %>%
ungroup %>%
select(-firstObs)
-output
# A tibble: 6 × 2
animals names
<chr> <chr>
1 Mice Jerry
2 Cats Tom
3 Cats Garfield
4 Dogs Snoopy
5 Birds Tweety
6 Birds Donald
Or use match
with slice
x %>%
group_by(animals) %>%
slice(seq_len(y$firstObs[match(cur_group()$animals, y$animals)])) %>%
ungroup
# A tibble: 6 × 2
animals names
<chr> <chr>
1 Birds Tweety
2 Birds Donald
3 Cats Tom
4 Cats Garfield
5 Dogs Snoopy
6 Mice Jerry