I have a data like this :
Response Prompt Prompt.Label Comments
1 Friend who si3 3_a
2 1 was si4_en 3_a
3 1 was si4_nag 3_a
4 Family who si3 3_b
5 1 was si4_nag 3_b
6 1 was si4_en 3_b
7 other who si3 4_a
8 1 was si4_sp 4_a
9 other who si3 4_b
10 1 was si4_en 4_b
11 Friend who si3 5_a
12 1 was si4_en 5_a
13 Family who si3 5_b
14 1 was si4_en 5_b
data<-structure(list(Response = c("Friend", "1", "1", "Family", "1",
"1", "other", "1", "other", "1", "Friend", "1", "Family", "1"
), Prompt = c("who", "was", "was", "who", "was", "was", "who",
"was", "who", "was", "who", "was", "who", "was"), Prompt.Label = c("si3",
"si4_en", "si4_nag", "si3", "si4_nag", "si4_en", "si3", "si4_sp",
"si3", "si4_en", "si3", "si4_en", "si3", "si4_en"), Comments = c("3_a",
"3_a", "3_a", "3_b", "3_b", "3_b", "4_a", "4_a", "4_b", "4_b",
"5_a", "5_a", "5_b", "5_b")), class = "data.frame", row.names = c(NA,
-14L))
I am looking to reshape the data to have two columns, including Source, and Target. "Source" should have the values from "Response" when the "Prompt"=="who", and "Target" should be the values of "Prompt.Label" if "Prompt"= "was" for each value of the "Comments" columns. For instance, for the first row, Friend is going for "Source" , and si4_en is for "Target".
The desire data should be :
Source Target
1 Friend si4_en
2 Friend si4_nag
3 Family si4_nag
4 Family si4_en
5 other si4_sp
6 other si4_en
7 Friend si4_en
8 Family si4_en
CodePudding user response:
Here is one way to get the desired output:
library(dplyr)
data %>%
group_by(cumsum(Response!=1)) %>%
mutate(Response = ifelse(Response==1, first(Response), Response)) %>%
filter(row_number() !=1) %>%
ungroup() %>%
select(Response, Target=Prompt.Label)
Response Target
<chr> <chr>
1 Friend si4_en
2 Friend si4_nag
3 Family si4_nag
4 Family si4_en
5 other si4_sp
6 other si4_en
7 Friend si4_en
8 Family si4_en
CodePudding user response:
I feel we can do it with some very simple tidyverse
functions, as long as we do some basic data wrangling beforehand. This data.frame is not tidy. The Response
variable is not self-explicable, and needs some contextual clues for understanding.
I feel we may be better off by first tidying it up with tidyr::fill
.
Then we can do some simple select/filter
.
data %>%
mutate(Response = na_if(Response, 1)) %>%
fill(Response) %>%
filter(Prompt == 'was') %>%
select(Source = Response, Target = Prompt.Label)
Source Target
1 Friend si4_en
2 Friend si4_nag
3 Family si4_nag
4 Family si4_en
5 other si4_sp
6 other si4_en
7 Friend si4_en
8 Family si4_en