I have 100 pdf medical reports of different persons, I included each report into a list in R, they have two columns with a lot of different information each one, but I just want the reports that have the gallbladder tissue, so I want to create an ID for the all report nut only the rows that contain the word "gallbladder". Then I want to filter only the gallbladder reports to extract further information. These is how it looks each element of the list (They have much more information)
list[[1]]
report text text_2
1 name andres
1 tissue gallbladder
1 rut 11455698
list[[2]]
report text text_2
2 name ana
2 tissue liver
2 rut 5556678
I want to create the ID according to tissue : gallbladder
list[[1]]
report text text_2 ID
1 name andres 1
1 tissue gallbladder 1
1 rut 11455698 1
list[[2]]
report text text_2 ID
2 name ana 0
2 tissue liver 0
2 rut 5556678 0
then i want to filter only the reports that the ID==1
I tried many ways but i just have the ID for the row, not for the all report.
list[[1]]
report text text_2 ID
1 name andres 0
1 tissue gallbladder 1
1 rut 11455698 0
list[[2]]
report text text_2 ID
2 name ana 0
2 tissue liver 0
2 rut 5556678 0
Maybe you have some ideas! Thank you!
CodePudding user response:
We may loop over the list
with lapply
, then create the ID
, column by checking if there are any
value in 'text_2' column as "gallbladder" - any
ensure to return a single TRUE/FALSE
which gets recycled for the entire data in the list
and this logical column is coerced to binary with as.integer
or just
list2 <- lapply(list, function(x)
transform(x, ID = (any(text_2 == "gallbladder"))))