I have a data frame of two columns of title
and text
.
The data frame looks like this:
title | text |
---|---|
foreign keys | A foreign key is a column or group of columns... |
Week 2 | In Week 2 the encoding... |
comments | colection of comments about Week 2 |
Statistics | Statistics is the discipline... |
comments | collection of comments about Statistics |
The data frame basically says the comments of a topic are exactly present below it. So I want to link/map these two things such that if I give the name of the topic (title
) it will retrieve its corresponding review (text
). In this example since Topic 1 does not have any reviews, I don't need them. By this, I want to reduce the size of my data frame to an extent by keeping only reviews related topics.
So far I could do only the following:
df %>%
filter(title == "Week 2") %>%
pull(text)
which gives me the text
corresponding to it (which is obvious) and not the review of comments about Week 2. And for Topics that do not have any reviews below it, I do not need them.
CodePudding user response:
We may need to filter
the 'Topic' having the 'review' by creating a grouping column. Once we subset the data, it is easier to pull
the 'text' or create a named vector of 'text' with title
library(dplyr)
library(stringr)
df1 %>%
group_by(grp = cumsum(str_detect(title, '^Topic'))) %>%
filter(any(str_detect(title, 'review')) & str_detect(text, 'text')) %>%
ungroup
-output
# A tibble: 2 × 3
title text grp
<chr> <chr> <int>
1 Topic 2 text of Topic 2 2
2 Topic 3 text of Topic 3 3
For the updated data
df2 %>%
group_by(grp = cumsum(c(TRUE, diff(str_detect(title, 'comments')) != 1))) %>%
filter(any(str_detect(title, 'comments') ) & title != 'comments') %>%
ungroup
-output
# A tibble: 2 × 3
title text grp
<chr> <chr> <int>
1 Week 2 In Week 2 the encoding... 2
2 Statistics Statistics is the discipline... 3
data
df1 <- structure(list(title = c("Topic 1", "Topic 2", "review 2", "Topic 3",
"review 3"), text = c("text of Topic 1", "text of Topic 2", "review of Topic 2",
"text of Topic 3", "review of Topic 3")), class = "data.frame",
row.names = c(NA,
-5L))
df2 <- structure(list(title = c("foreign keys", "Week 2", "comments",
"Statistics", "comments"), text = c("A foreign key is a column or group of columns...",
"In Week 2 the encoding...", "colection of comments about Week 2",
"Statistics is the discipline...", "collection of comments about Statistics"
)), class = "data.frame", row.names = c(NA, -5L))