I can't upload the file into stackoverflow but I have a PDF containing a table spanning 3 pages. After using library(pdftools) and pdf_text(), it creates a 3 element character list where each element is a long string of all text from each page.
library(pdftools)
df <- pdf_text(file.pdf)
The data I need is on the 2nd page. I get the output:
df[2]
All Households 19,015 10,030 8,985 3,635 585 3,055 19.1 5.8 34.0\n\nHousing above standards 12,365 8,225 4,145 0 0 0 0.0 0.0 0.0\n\nBelow one or more housing standards 6,650 1,805 4,845 3,640 585 3,055 54.7 32.4 63.1\n\nBelow affordability standard12 4,885 1,230 3,660 3,125 535 2,590 64.0 43.5 70.8\n\nBelow adequacy standard13 1,360 555 810 425 75 350 31.2 13.5 43.2\n\n\n\n\n
I want to isolate the row "Below one or more housing standards" and the 8th column which contains the value "54.7".
I believe the next steps are to split the long string into lines by the line break character "\n", identify the applicable line, split the line into words, and select the 8th word.
I've tried splitting into lines using:
library(stringr)
lines <- df[2] %>% str_split("\n")
It returns a "List of 1" and I'm not sure how to work with it. Any suggestions on the syntax?