Home > database >  Empty data change function by statistical mode
Empty data change function by statistical mode

Time:05-20

I am working with a database of 500 columns and 20,000 rows and I want to change the NA data by the statistical mode, so I avoid eliminating those values, simply change it by the mode of the specific column, so I am given an example base to show the code that I am running

library(tidyverse)
temp <- c(20.37, 18.56, NA, 21.96, 29.53, 28.16,
          36.38, 36.62, 40.03, 27.59, 22.15, 19.85)
humedad <- c(88, 86, 81, 79, 80, 78,
             71, NA, 78, 82, 85, 83)
precipitaciones <- c(72, 33.9, 37.5, 36.6, 31.0, 16.6,
                     1.2, 6.8, 36.8, 30.8, 38.5, 22.7)
precipitaciones2 <- c(72,NA, 6.8, 36.6, 31.0, 16.6,
                      1.2, 6.8, 36.8, 6.8, 38.5, 22.7)
precipitaciones3 <- c(72,NA, 37.5, 36, 2, 16.6,
                      1.2, 8, 0.8, NA, 38.5, 8)
mes <- c("enero", "febrero", "marzo", "abril", "mayo", "junio",
         "julio", "agosto", "septiembre", "octubre", "noviembre", "diciembre")

datos <- data.frame(mes = mes, temperatura = temp, humedad = humedad,
                    precipitaciones = precipitaciones,
                    precipitaciones2 = precipitaciones2,
                    precipitaciones3 = precipitaciones3)

I want to replace the NA data with the statistical mode for a much larger database, so what is required is to program it for any other database, I have the following code:

#mode
mode=getmoda<-function(v){
  uniqv<-unique(v)
  uniqv[which.max(tabulate(match(v,uniqv)))]
}


reemplazar<-function(y){
  i=2
  lista_vacia1 <- list()
  lista_vacia2<-list()
  a<-""
  while(i<=5){
    lista_vacia1<-y[,i]                                  #select the column to filter
    lista_vacia2<-lista_vacia1[!is.na(lista_vacia1)]     #remove the NA data
    a<-mode(lista_vacia2)                                #get the mode of the column
    y<-y %>% mutate_at(i,~replace_na(.,a))
    
    a<- ""
    lista_vacia1 <- list()
    lista_vacia2<-list()
    
  }
}

What happens is that when I run the program it makes an infinite loop, it never goes beyond loading and it does not show any message. I would like you to help me to know why this happens or if it is possible to change the code.

CodePudding user response:

In any function the output should be specified using return(). Moreover, your data frame has six columns, I'm not sure why you have specified while(i<=5). It should be while(i<=6). After each iteration, i should be increased by 1.

reemplazar<-function(y){
  i=2
  lista_vacia1 <- list()
  lista_vacia2<-list()
  a<-""
  while(i<=6){
    lista_vacia1<-y[,i]                                  #select the column to filter
    lista_vacia2<-lista_vacia1[!is.na(lista_vacia1)]     #remove the NA data
    a<-mode(lista_vacia2)                                #get the mode of the column
    y<-y %>% mutate_at(i,~replace_na(.,a))
    
    a<- ""
    lista_vacia1 <- list()
    lista_vacia2<-list()
    i <- i 1          # increment of i
  }
  return(y)            # Specifying the output object
}

The output is

> reemplazar(datos)
          mes temperatura humedad precipitaciones precipitaciones2 precipitaciones3
1       enero       20.37      88            72.0             72.0             72.0
2     febrero       18.56      86            33.9              6.8              8.0
3       marzo       20.37      81            37.5              6.8             37.5
4       abril       21.96      79            36.6             36.6             36.0
5        mayo       29.53      80            31.0             31.0              2.0
6       junio       28.16      78            16.6             16.6             16.6
7       julio       36.38      71             1.2              1.2              1.2
8      agosto       36.62      78             6.8              6.8              8.0
9  septiembre       40.03      78            36.8             36.8              0.8
10    octubre       27.59      82            30.8              6.8              8.0
11  noviembre       22.15      85            38.5             38.5             38.5
12  diciembre       19.85      83            22.7             22.7              8.0

CodePudding user response:

There is no need for a while loop nor any other type of loop, mutate/across can do it much more simply. I have changed the function Mode to have an argument na.rm. I have also changed its name, ?mode in R has a different meaning.

library(tidyverse)

Mode <- getmoda <- function(v, na.rm = FALSE){
  if(na.rm) v <- v[!is.na(v)]
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

reemplazar <- function(y){
  y %>%
    mutate(across(everything(), ~replace_na(., Mode(., na.rm = TRUE))))
}

reemplazar(datos)
#>           mes temperatura humedad precipitaciones precipitaciones2 precipitaciones3
#> 1       enero       20.37      88            72.0             72.0             72.0
#> 2     febrero       18.56      86            33.9              6.8              8.0
#> 3       marzo       20.37      81            37.5              6.8             37.5
#> 4       abril       21.96      79            36.6             36.6             36.0
#> 5        mayo       29.53      80            31.0             31.0              2.0
#> 6       junio       28.16      78            16.6             16.6             16.6
#> 7       julio       36.38      71             1.2              1.2              1.2
#> 8      agosto       36.62      78             6.8              6.8              8.0
#> 9  septiembre       40.03      78            36.8             36.8              0.8
#> 10    octubre       27.59      82            30.8              6.8              8.0
#> 11  noviembre       22.15      85            38.5             38.5             38.5
#> 12  diciembre       19.85      83            22.7             22.7              8.0

Created on 2022-05-19 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related