Home > database >  If Many Columns Contains String then enter value for that row
If Many Columns Contains String then enter value for that row

Time:04-07

I have a dataframe called bd that looks like that:

The dataframe is this Link Basedatos.

   FAC_1                                       FAC_2                     FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11
   <chr>                                       <chr>                     <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr> 
 1 Substrato de mala calidad (muy meteorizado) "  Naturaleza del suelo ~ "  M~  NA    NA    NA   NA    NA    NA    NA     NA    
 2 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  N~ "  M~ "  M~  NA   NA    NA    NA    NA     NA    
 3 NA                                           NA                        NA    NA    NA    NA   NA    NA    NA    NA     NA    
 4 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  O~ "  P~ "  M~ NA    NA    NA    NA     NA    
 5 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  O~ "  N~ "  P~ NA    NA    NA    NA     NA    
 6 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  P~ "  M~  NA   NA    NA    NA    NA     NA    
 7 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  N~ "  P~ "  M~ NA    NA    NA    NA     NA    
 8 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  M~ "  P~ "  M~  NA   NA    NA    NA    NA     NA    
 9 Substrato de mala calidad (muy meteorizado) "  Alternancia de rocas ~ "  R~ "  N~ "  P~ "  M~ NA    NA    NA    NA     NA    
10 Substrato de mala calidad (muy meteorizado) "  Rocas muy fracturadas~ "  O~ "  N~ "  M~ "  P~ NA    NA    NA    NA     NA   

So I need to create 11 new columns, the first column have to search all the values that contains "Substrato" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string; the second column have to search all the values that contains "Alternancia" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string, and the same for all the rest of the columns.

To achieve this my code is the following:

The vector containers is the string that I have to find in the dataframe bd .

vect <- 1:11 #index vector
variables <- paste("FAC", vect, sep = "_") # variables names
containers <- c("Substrato","Alternancia", "Presencia", "fracturadas","desfavorable",
                "Naturaleza", "Material", "Pendiente", "Morfología", "escacez", "Otro") # strings to find

bd$var1 <- character() #empty new column

# Create the first new column with "Substrato" like string:

for (i in 1:length(vect)){
  out <- if_else(grepl(containers[1], bd[ ,i]), "1", "0") 
  bd$var1<- c(var1, out)
}

#For the next columns I changed containers[1] by containers[2],[3],[4],....[11] and var1 by var2,3,4,..11.

But I have a problem in the code that not create the variables, I checked many times but I can not solve the problem.

I am going to feel so happy is someone can help me. Thank so much!!

CodePudding user response:

1. If you want a single number in var (1 or 0)

1 = the corresponding string from containers was found at least once in all of the eleven FAC_ variables in the same row

Using the tidyverse:

library(tidyverse)
for (i in seq_along(containers)){
  bd <- bd %>%
    mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){str_detect(vec, containers[i]) %>% any(na.rm=T) %>% as.numeric}))
}

Output

> bd
# A tibble: 35,279 x 22
   FAC_1    FAC_2    FAC_3    FAC_4    FAC_5   FAC_6  FAC_7 FAC_8 FAC_9 FAC_10 FAC_11  var1  var2  var3  var4  var5  var6  var7  var8  var9 var10 var11
   <chr>    <chr>    <chr>    <chr>    <chr>   <chr>  <lgl> <lgl> <lgl> <lgl>  <lgl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 Substra~ Natural~ "Morfol~  NA       NA      NA    NA    NA    NA    NA     NA         1     0     0     0     0     1     0     0     0     0     0
 2 Substra~ Alterna~ "Natura~ "Materi~ "Morfo~  NA    NA    NA    NA    NA     NA         1     1     0     0     0     1     1     0     0     0     0
 3 NA       NA        NA       NA       NA      NA    NA    NA    NA    NA     NA         0     0     0     0     0     0     0     0     0     0     0
 4 Substra~ Alterna~ "Rocas ~ "Orient~ "Pendi~ "Morf~ NA    NA    NA    NA     NA         1     1     0     1     1     0     0     1     0     0     0
 5 Substra~ Alterna~ "Rocas ~ "Orient~ "Natur~ "Pend~ NA    NA    NA    NA     NA         1     1     0     1     1     1     0     1     0     0     0
 6 Substra~ Alterna~ "Rocas ~ "Pendie~ "Morfo~  NA    NA    NA    NA    NA     NA         1     1     0     1     0     0     0     1     0     0     0
 7 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA    NA    NA    NA     NA         1     1     0     1     0     1     0     1     0     0     0
 8 Substra~ Alterna~ "Materi~ "Pendie~ "Morfo~  NA    NA    NA    NA    NA     NA         1     1     0     0     0     0     1     1     0     0     0
 9 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA    NA    NA    NA     NA         1     1     0     1     0     1     0     1     0     0     0
10 Substra~ Rocas m~ "Orient~ "Natura~ "Mater~ "Pend~ NA    NA    NA    NA     NA         1     0     0     1     1     1     1     1     0     0     0
# ... with 35,269 more rows

2. If you want a 1 or 0 for each of the eleven FAC_, and this in every var

I put the 1 and 0 in a list, so var1 is a vector of lists. Each element is a list with 11 numbers.

for (i in seq_along(containers)){
  bd <- bd %>%
    mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){map(vec, ~grepl(containers[i], .) %>% as.numeric)}))
}

The calculation takes a bit of time but I do find

Output

> bd
# A tibble: 35,279 x 22
   FAC_1   FAC_2   FAC_3   FAC_4   FAC_5   FAC_6  FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1   var2   var3   var4   var5  var6  var7  var8  var9  var10 var11
   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  <lgl> <lgl> <lgl> <lgl>  <lgl>  <list> <list> <list> <list> <lis> <lis> <lis> <lis> <lis> <lis> <lis>
 1 Substr~ Natura~ "Morfo~  NA      NA      NA    NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 2 Substr~ Altern~ "Natur~ "Mater~ "Morfo~  NA    NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 3 NA      NA       NA      NA      NA      NA    NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 4 Substr~ Altern~ "Rocas~ "Orien~ "Pendi~ "Morf~ NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 5 Substr~ Altern~ "Rocas~ "Orien~ "Natur~ "Pend~ NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 6 Substr~ Altern~ "Rocas~ "Pendi~ "Morfo~  NA    NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 7 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 8 Substr~ Altern~ "Mater~ "Pendi~ "Morfo~  NA    NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
 9 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
10 Substr~ Rocas ~ "Orien~ "Natur~ "Mater~ "Pend~ NA    NA    NA    NA     NA     <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
# ... with 35,269 more rows
> bd$var1[1]
[[1]]
[[1]]$FAC_1
[1] 1

[[1]]$FAC_2
[1] 0

[[1]]$FAC_3
[1] 0

[[1]]$FAC_4
[1] 0

[[1]]$FAC_5
[1] 0

[[1]]$FAC_6
[1] 0

[[1]]$FAC_7
[1] 0

[[1]]$FAC_8
[1] 0

[[1]]$FAC_9
[1] 0

[[1]]$FAC_10
[1] 0

[[1]]$FAC_11
[1] 0
  • Related