I have just shifted from Python to R. I want to extract multiple tables based on some texts on the 1st column (if any of the texts are present on the 1st column, then extract those tables and concatenate them).
my_text <- c("cash investments", "debt investments")
tabs <- NULL
tabs[[1]] <- tibble(x = sample(c("investments", "trash")), y = 1)
tabs[[2]] <- tibble(x = sample(c(my_text, "trash")), y = 2)
tabs[[3]] <- tibble(x = sample(c(my_text)), y = 3)
print(tabs)
A tibble: 2 × 2
x y
<chr> <dbl>
investments 1
trash 1
A tibble: 3 × 2
x y
<chr> <dbl>
cash investments 2
trash 2
debt investments 2
A tibble: 2 × 2
x y
<chr> <dbl>
debt investments 3
cash investments 3
Here, only the 2nd and 3rd tables contain text from my_text
and I want both of them as a combined dataframe. I was trying the following script:
find_table <- function(table)
{
my_text <- c("cash investments", "debt investments")
for (table_num in 1:length(table))
{
for (char in my_text)
{
if (char %in% tolower(table[[table_num]]$x))
{
dataframe <- data.frame(rbind(table[[table_num]]))
return(dataframe)
}}}}
Desired outcome
A data.frame: 5 × 2
x y
<chr> <dbl>
cash investments 2
trash 2
debt investments 2
debt investments 3
cash investments 3
In Python, we used to create an empty dataset ie. df = []
, check if a column contains any of the text, and then concatenate them (if there are duplicate tables, them remove them too). How this can be done in R with based on many texts?
CodePudding user response:
Updated based on your updates:
You can just do:
library(purrr)
library(dplyr)
tabs |>
keep(~any(.x$x %in% my_text)) |>
bind_rows() |>
distinct()
(Head) Output is:
# A tibble: 5 × 2
x y
<chr> <dbl>
1 debt investments 2
2 trash 2
3 cash investments 2
4 debt investments 3
5 cash investments 3