I am working with a database of 500 columns and 20,000 rows and I want to change the NA data by the statistical mode, so I avoid eliminating those values, simply change it by the mode of the specific column, so I am given an example base to show the code that I am running
library(tidyverse)
temp <- c(20.37, 18.56, NA, 21.96, 29.53, 28.16,
36.38, 36.62, 40.03, 27.59, 22.15, 19.85)
humedad <- c(88, 86, 81, 79, 80, 78,
71, NA, 78, 82, 85, 83)
precipitaciones <- c(72, 33.9, 37.5, 36.6, 31.0, 16.6,
1.2, 6.8, 36.8, 30.8, 38.5, 22.7)
precipitaciones2 <- c(72,NA, 6.8, 36.6, 31.0, 16.6,
1.2, 6.8, 36.8, 6.8, 38.5, 22.7)
precipitaciones3 <- c(72,NA, 37.5, 36, 2, 16.6,
1.2, 8, 0.8, NA, 38.5, 8)
mes <- c("enero", "febrero", "marzo", "abril", "mayo", "junio",
"julio", "agosto", "septiembre", "octubre", "noviembre", "diciembre")
datos <- data.frame(mes = mes, temperatura = temp, humedad = humedad,
precipitaciones = precipitaciones,
precipitaciones2 = precipitaciones2,
precipitaciones3 = precipitaciones3)
I want to replace the NA data with the statistical mode for a much larger database, so what is required is to program it for any other database, I have the following code:
#mode
mode=getmoda<-function(v){
uniqv<-unique(v)
uniqv[which.max(tabulate(match(v,uniqv)))]
}
reemplazar<-function(y){
i=2
lista_vacia1 <- list()
lista_vacia2<-list()
a<-""
while(i<=5){
lista_vacia1<-y[,i] #select the column to filter
lista_vacia2<-lista_vacia1[!is.na(lista_vacia1)] #remove the NA data
a<-mode(lista_vacia2) #get the mode of the column
y<-y %>% mutate_at(i,~replace_na(.,a))
a<- ""
lista_vacia1 <- list()
lista_vacia2<-list()
}
}
What happens is that when I run the program it makes an infinite loop, it never goes beyond loading and it does not show any message. I would like you to help me to know why this happens or if it is possible to change the code.
CodePudding user response:
In any function the output should be specified using return()
. Moreover, your data frame has six columns, I'm not sure why you have specified while(i<=5)
. It should be while(i<=6)
. After each iteration, i
should be increased by 1.
reemplazar<-function(y){
i=2
lista_vacia1 <- list()
lista_vacia2<-list()
a<-""
while(i<=6){
lista_vacia1<-y[,i] #select the column to filter
lista_vacia2<-lista_vacia1[!is.na(lista_vacia1)] #remove the NA data
a<-mode(lista_vacia2) #get the mode of the column
y<-y %>% mutate_at(i,~replace_na(.,a))
a<- ""
lista_vacia1 <- list()
lista_vacia2<-list()
i <- i 1 # increment of i
}
return(y) # Specifying the output object
}
The output is
> reemplazar(datos)
mes temperatura humedad precipitaciones precipitaciones2 precipitaciones3
1 enero 20.37 88 72.0 72.0 72.0
2 febrero 18.56 86 33.9 6.8 8.0
3 marzo 20.37 81 37.5 6.8 37.5
4 abril 21.96 79 36.6 36.6 36.0
5 mayo 29.53 80 31.0 31.0 2.0
6 junio 28.16 78 16.6 16.6 16.6
7 julio 36.38 71 1.2 1.2 1.2
8 agosto 36.62 78 6.8 6.8 8.0
9 septiembre 40.03 78 36.8 36.8 0.8
10 octubre 27.59 82 30.8 6.8 8.0
11 noviembre 22.15 85 38.5 38.5 38.5
12 diciembre 19.85 83 22.7 22.7 8.0
CodePudding user response:
There is no need for a while
loop nor any other type of loop, mutate/across
can do it much more simply. I have changed the function Mode
to have an argument na.rm
. I have also changed its name, ?mode
in R has a different meaning.
library(tidyverse)
Mode <- getmoda <- function(v, na.rm = FALSE){
if(na.rm) v <- v[!is.na(v)]
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
reemplazar <- function(y){
y %>%
mutate(across(everything(), ~replace_na(., Mode(., na.rm = TRUE))))
}
reemplazar(datos)
#> mes temperatura humedad precipitaciones precipitaciones2 precipitaciones3
#> 1 enero 20.37 88 72.0 72.0 72.0
#> 2 febrero 18.56 86 33.9 6.8 8.0
#> 3 marzo 20.37 81 37.5 6.8 37.5
#> 4 abril 21.96 79 36.6 36.6 36.0
#> 5 mayo 29.53 80 31.0 31.0 2.0
#> 6 junio 28.16 78 16.6 16.6 16.6
#> 7 julio 36.38 71 1.2 1.2 1.2
#> 8 agosto 36.62 78 6.8 6.8 8.0
#> 9 septiembre 40.03 78 36.8 36.8 0.8
#> 10 octubre 27.59 82 30.8 6.8 8.0
#> 11 noviembre 22.15 85 38.5 38.5 38.5
#> 12 diciembre 19.85 83 22.7 22.7 8.0
Created on 2022-05-19 by the reprex package (v2.0.1)