I have a dataframe called bd that looks like that:
The dataframe is this Link Basedatos.
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Substrato de mala calidad (muy meteorizado) " Naturaleza del suelo ~ " M~ NA NA NA NA NA NA NA NA
2 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " N~ " M~ " M~ NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA
4 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " O~ " P~ " M~ NA NA NA NA NA
5 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " O~ " N~ " P~ NA NA NA NA NA
6 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " P~ " M~ NA NA NA NA NA NA
7 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " N~ " P~ " M~ NA NA NA NA NA
8 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " M~ " P~ " M~ NA NA NA NA NA NA
9 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " N~ " P~ " M~ NA NA NA NA NA
10 Substrato de mala calidad (muy meteorizado) " Rocas muy fracturadas~ " O~ " N~ " M~ " P~ NA NA NA NA NA
So I need to create 11 new columns, the first column have to search all the values that contains "Substrato" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string; the second column have to search all the values that contains "Alternancia" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string, and the same for all the rest of the columns.
To achieve this my code is the following:
The vector containers is the string that I have to find in the dataframe bd .
vect <- 1:11 #index vector
variables <- paste("FAC", vect, sep = "_") # variables names
containers <- c("Substrato","Alternancia", "Presencia", "fracturadas","desfavorable",
"Naturaleza", "Material", "Pendiente", "Morfología", "escacez", "Otro") # strings to find
bd$var1 <- character() #empty new column
# Create the first new column with "Substrato" like string:
for (i in 1:length(vect)){
out <- if_else(grepl(containers[1], bd[ ,i]), "1", "0")
bd$var1<- c(var1, out)
}
#For the next columns I changed containers[1] by containers[2],[3],[4],....[11] and var1 by var2,3,4,..11.
But I have a problem in the code that not create the variables, I checked many times but I can not solve the problem.
I am going to feel so happy is someone can help me. Thank so much!!
CodePudding user response:
1. If you want a single number in var (1 or 0)
1 = the corresponding string from containers was found at least once in all of the eleven FAC_ variables in the same row
Using the tidyverse
:
library(tidyverse)
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){str_detect(vec, containers[i]) %>% any(na.rm=T) %>% as.numeric}))
}
Output
> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Substra~ Natural~ "Morfol~ NA NA NA NA NA NA NA NA 1 0 0 0 0 1 0 0 0 0 0
2 Substra~ Alterna~ "Natura~ "Materi~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 1 1 0 0 0 0
3 NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0
4 Substra~ Alterna~ "Rocas ~ "Orient~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 1 0 0 1 0 0 0
5 Substra~ Alterna~ "Rocas ~ "Orient~ "Natur~ "Pend~ NA NA NA NA NA 1 1 0 1 1 1 0 1 0 0 0
6 Substra~ Alterna~ "Rocas ~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 1 0 0 0 1 0 0 0
7 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
8 Substra~ Alterna~ "Materi~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 0 1 1 0 0 0
9 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
10 Substra~ Rocas m~ "Orient~ "Natura~ "Mater~ "Pend~ NA NA NA NA NA 1 0 0 1 1 1 1 1 0 0 0
# ... with 35,269 more rows
2. If you want a 1 or 0 for each of the eleven FAC_, and this in every var
I put the 1 and 0 in a list, so var1 is a vector of lists. Each element is a list with 11 numbers.
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){map(vec, ~grepl(containers[i], .) %>% as.numeric)}))
}
The calculation takes a bit of time but I do find
Output
> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <list> <list> <list> <list> <lis> <lis> <lis> <lis> <lis> <lis> <lis>
1 Substr~ Natura~ "Morfo~ NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
2 Substr~ Altern~ "Natur~ "Mater~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
3 NA NA NA NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
4 Substr~ Altern~ "Rocas~ "Orien~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
5 Substr~ Altern~ "Rocas~ "Orien~ "Natur~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
6 Substr~ Altern~ "Rocas~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
7 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
8 Substr~ Altern~ "Mater~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
9 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
10 Substr~ Rocas ~ "Orien~ "Natur~ "Mater~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
# ... with 35,269 more rows
> bd$var1[1]
[[1]]
[[1]]$FAC_1
[1] 1
[[1]]$FAC_2
[1] 0
[[1]]$FAC_3
[1] 0
[[1]]$FAC_4
[1] 0
[[1]]$FAC_5
[1] 0
[[1]]$FAC_6
[1] 0
[[1]]$FAC_7
[1] 0
[[1]]$FAC_8
[1] 0
[[1]]$FAC_9
[1] 0
[[1]]$FAC_10
[1] 0
[[1]]$FAC_11
[1] 0