I'm using R to extract sentences that contain specific words ("everything", "along", "wrote") from the lyrics of a song and here is the song: Yellow- Coldplay
Look at the stars Look how they shine for you And everything you do Yeah, they were all yellow I came along I wrote a song for you And all the things you do And it was called Yellow So, then I took my turn What a thing to've done And it was all yellow Your skin Oh yeah, your skin and bones Turn in to something beautiful Do you know You know I love you so You know I love you so I swam across I jumped across for you What a thing to do 'Cause you were all yellow I drew a line
Create a vector with the letter, but it does not compile
CodePudding user response:
Here's an option using tidyverse
. It's not perfect and you'll have to adapt to your specific use case:
lyrics <- data.frame(yellow = "Look at the stars Look how they shine for you And everything you do Yeah, they were all yellow I came along I wrote a song for you And all the things you do And it was called Yellow So, then I took my turn What a thing to've done And it was all yellow Your skin Oh yeah, your skin and bones Turn in to something beautiful Do you know You know I love you so You know I love you so I swam across I jumped across for you What a thing to do 'Cause you were all yellow I drew a line")
library(tidyverse)
lyrics %>%
mutate(yellow = gsub('([[:upper:]])', '<>\\1', yellow)) %>%
separate_rows(yellow, sep = "<>") %>%
mutate(flag = str_detect(yellow, "everything|along|wrote")) %>%
filter(flag == T)
This gives us:
# A tibble: 3 x 2
yellow flag
<chr> <lgl>
1 "And everything you do " TRUE
2 "I came along " TRUE
3 "I wrote a song for you " TRUE
You have to figure out: What constitutes a sentence? I counted a new sentence when there was capitalization.