#Read state of union file
speech<-readLines("stateoftheunion1790-2012.txt")
head(speech)
What does this code below do after it reads the file ??? I was told It will give a list where each entry is the text between consecutive ***'s. But what does that mean.
x <- grep("^\\*{3}", speech)
list.speeches <- list()
for(i in 1:length(x)){
if(i == 1){
list.speeches[[i]] <- paste(speech[1:x[1]], collapse = " ")
}else{
list.speeches[[i]] <- paste(speech[x[i-1]:x[i]], collapse = " ")}
}
CodePudding user response:
It looks like you're new to SO; welcome to the community! As @Allan Cameron pointed out, whenever you ask questions, especially if you want great answers quickly, it's best to make your question reproducible. This includes sample data like the output from dput()
or reprex::reprex()
. Check it out: making R reproducible questions.
I've detailed each part of the code with coding comments. Feel free to ask questions if you've got any.
speech <- readLines("https://raw.githubusercontent.com/jdwilson4/Intro-Data-Science/master/Data/stateoftheunion1790-2012.txt")
head(speech) # print the first 6 rows captured in the object speech
# [1] "The Project Gutenberg EBook of Complete State of the Union Addresses,"
# [2] "from 1790 to the Present"
# [3] ""
# [4] "Character set encoding: UTF8"
# [5] ""
# [6] "The addresses are separated by three asterisks"
x <- grep("^\\*{3}", speech)
# searches speech char vector for indices coinciding with strings of 3 asterisks ***
list.speeches <- list() # create a list to store the results
for(i in 1:length(x)){ # for each index that coincided with three asterisks
if(i == 1){ # if it's the first set of asterisks ***
list.speeches[[i]] <- paste(speech[1:x[1]], collapse = " ")
# capture all vector elements up to the first set of 3 asterisks
# capture file information and who gave each of the speeches
}else{
list.speeches[[i]] <- paste(speech[x[i-1]:x[i]], collapse = " ")}
} # capture the info between each set of subsequent indices
# capture all rows of each speech (currently separated by ***)
# place each complete speech in a different list position