Linking/Mapping columns in a dataframe in R-CodePudding

I have a data frame of two columns of title and text.

The data frame looks like this:

title	text
foreign keys	A foreign key is a column or group of columns...
Week 2	In Week 2 the encoding...
comments	colection of comments about Week 2
Statistics	Statistics is the discipline...
comments	collection of comments about Statistics

The data frame basically says the comments of a topic are exactly present below it. So I want to link/map these two things such that if I give the name of the topic (title) it will retrieve its corresponding review (text). In this example since Topic 1 does not have any reviews, I don't need them. By this, I want to reduce the size of my data frame to an extent by keeping only reviews related topics.

So far I could do only the following:

df %>% 
  filter(title == "Week 2") %>% 
  pull(text)

which gives me the text corresponding to it (which is obvious) and not the review of comments about Week 2. And for Topics that do not have any reviews below it, I do not need them.

CodePudding user response：

We may need to filter the 'Topic' having the 'review' by creating a grouping column. Once we subset the data, it is easier to pull the 'text' or create a named vector of 'text' with title

library(dplyr)
library(stringr)
df1 %>% 
 group_by(grp = cumsum(str_detect(title, '^Topic'))) %>% 
 filter(any(str_detect(title, 'review')) & str_detect(text, 'text'))  %>%
 ungroup

-output

# A tibble: 2 × 3
  title   text              grp
  <chr>   <chr>           <int>
1 Topic 2 text of Topic 2     2
2 Topic 3 text of Topic 3     3

For the updated data

df2 %>% 
  group_by(grp = cumsum(c(TRUE, diff(str_detect(title, 'comments')) != 1))) %>%  
  filter(any(str_detect(title, 'comments') ) & title != 'comments') %>% 
  ungroup

-output

# A tibble: 2 × 3
  title      text                              grp
  <chr>      <chr>                           <int>
1 Week 2     In Week 2 the encoding...           2
2 Statistics Statistics is the discipline...     3

data

df1 <- structure(list(title = c("Topic 1", "Topic 2", "review 2", "Topic 3", 
"review 3"), text = c("text of Topic 1", "text of Topic 2", "review of Topic 2", 
"text of Topic 3", "review of Topic 3")), class = "data.frame",
 row.names = c(NA, 
-5L))

df2 <- structure(list(title = c("foreign keys", "Week 2", "comments", 
"Statistics", "comments"), text = c("A foreign key is a column or group of columns...", 
"In Week 2 the encoding...", "colection of comments about Week 2", 
"Statistics is the discipline...", "collection of comments about Statistics"
)), class = "data.frame", row.names = c(NA, -5L))