Home > OS >  R using grepl across multiple dataframes in a list
R using grepl across multiple dataframes in a list

Time:04-11

I have a list of dataframes that each contain multiple of the same columns. In one of the columns, there are multiple instances where a row just contains "[]". My goal is to replace these instances with a blank.

I've attempted to do so via the map function and grepl. While it runs there is no change to the output. Am I going in the right direction here?

Please not that I differentiate between "[]" and "[value]"

I only want to replace the empty brackets with blanks.

My code below:

first_column <- c("1", "2", "3","4")
second_column <- c("value1", "value2","[]","[value]")
first_column_2 <- c("5", "6", "7","8")
second_column_2 <- c("value1", "[]","[]","[value2]")
first_column_3<- c("9", "10", "11","12")
second_column_3 <- c("[]", "[value2]","[]","[]")
df_1 <- data.frame(first_column,second_column)
df_2 <- data.frame(first_column_2,second_column_2)
df_3 <- data.frame(first_column_3,second_column_3)
df_list <- list(df_1,df_2,df_3)

var <- c(2)
df_list <- map(df_list, ~.x[!grepl("[[]",var),])

Thanks!

CodePudding user response:

We can use lapply and gsub to accomplish this. grepl returns elements that match a pattern, whereas gsub allows you to replace matches with something else. Note that instead of specifying an empty string (''), you could just as easily specify NA, but that will depend on your definition of "blank".

Here I use base R's lapply, which in this case is equivalent to purrr::map (even the syntax is interchangeable here).

data <- lapply(df_list, function(x) {
  x %>% 
    mutate(across(where(is.character), ~gsub('\\[\\]', '', .x)))
})

[[1]]
  first_column second_column
1            1        value1
2            2        value2
3            3              
4            4       [value]

[[2]]
  first_column_2 second_column_2
1              5          value1
2              6                
3              7                
4              8        [value2]

[[3]]
  first_column_3 second_column_3
1              9                
2             10        [value2]
3             11                
4             12                

CodePudding user response:

You've got a few issues:

  • (a) you say you want to replace "[]" with "", but your code is trying to drop them completely, not replace them. Use sub instead of grepl for replacing---or even better, since you are matching a whole string don't use regex at all
  • (b) you are running grepl on the number 2: you have var <- 2 and your command is grepl("[[]",var), which is grepl("[[]", 2), which is always FALSE as the string "2" doesn't contain a brackets.
  • (c) Your grepl pattern is searching for any string that contains a [ in it.

As I said in (a), when you're matching a full string, you don't need regex at all. I'd do it like this:

df_list <- map(df_list, ~ {
  .x[[var]][.x[[var]] == "[]"] = ""
  .x
})
df_list
# [[1]]
#   first_column second_column
# 1            1        value1
# 2            2        value2
# 3            3              
# 4            4       [value]
# 
# [[2]]
#   first_column_2 second_column_2
# 1              5          value1
# 2              6                
# 3              7                
# 4              8        [value2]
# 
# [[3]]
#   first_column_3 second_column_3
# 1              9                
# 2             10        [value2]
# 3             11                
# 4             12              
  • Related