I hope you can help me with this problem, I have the following data like this:
ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow
Every time meet with base (base_yellow, base_red), it creates new group, the output that is expected as shown below, which gives a new variable:
ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow
CodePudding user response:
Try this:
library(tidyverse)
# Read data
mydata <- tibble::tribble(~ID,~colour,
1,"base_yellow",
1,"blue",
1,"base_red",
1,"blue",
1,"pink",
1,"blue",
1,"base_yellow",
2,"base_yellow",
2,"blue",
2,"base_red",
2,"blue",
2,"pink",
2,"blue",
2,"base_yellow",
3,"base_yellow",
3,"blue",
3,"pink",
3,"blue",
3,"base_yellow",
4,"base_yellow",
4,"blue",
4,"green",
4,"blue",
4,"green",
4,"blue",
4,"pink",
4,"blue",
4,"base_yellow")
# Add column to group by words starting with "base_"
mydata <- mydata %>%
mutate(base = str_starts(colour, "base_")) %>%
mutate(base = ifelse(base, colour, NA)) %>%
fill(base, .direction = "down")
# Group by ID and words starting with "base_" and paste words
mydata <- mydata %>%
group_by(ID, base) %>%
summarise(colour = paste(colour, collapse = ";")) %>%
select(-base)
Results:
> mydata
# A tibble: 6 × 2
# Groups: ID [4]
ID colour
<dbl> <chr>
1 1 base_red;blue;pink;blue
2 1 base_yellow;blue;base_yellow
3 2 base_red;blue;pink;blue
4 2 base_yellow;blue;base_yellow
5 3 base_yellow;blue;pink;blue;base_yellow
6 4 base_yellow;blue;green;blue;green;blue;pink;blue;base_yellow
CodePudding user response:
This is something you might be able to adapt for your needs.
First, create a vector vec
that includes row positions where colour
starts with "base".
Then, you can use map2_dfr
from purrr
that will provide colour
that ranges from start to end positions based on vec
. This will help with situations where the same colour
is used in more than one row in the end. A grouping variable group
is also created in this step.
After grouping by group
, you can keep only colour
groups that have more than one colour
and str_c
to collapse them together for the same group
.
library(tidyverse)
vec <- which(grepl("^base", df$colour))
map2_dfr(
vec[-length(vec)],
vec[-1],
~df[.x:.y, ],
.id = "group"
) %>%
group_by(group) %>%
filter(n_distinct(colour) > 1) %>%
summarise(ID = first(ID), colour = str_c(colour, collapse = "; ")) %>%
select(-group)
Output
ID colour
<int> <chr>
1 1 base_yellow; blue; base_red
2 1 base_red; blue; pink; blue; base_yellow
3 2 base_yellow; blue; base_red
4 2 base_red; blue; pink; blue; base_yellow
5 3 base_yellow; blue; pink; blue; base_yellow
6 4 base_yellow; blue; green; blue; green; blue; pink; blue; base_yellow