Home > database >  Warning regarding NAs while transforming data in the column with R
Warning regarding NAs while transforming data in the column with R

Time:01-13

There is a dataset data with many columns, but right now I'm interested at the column with age of the shop. In that column are values "-", NA or different numbers (character data type). Some values instead age (ex.: 2, 5, 10) are year of the opening (as example: 2015, 2018 etc.), so these cases I should transform to age of the shop (2022-year). I tried such R code:

data$age[as.numeric(data$age) %in% 1700:2022] <- 2022-as.numeric(data$age)

Rewritten numeric values look fine, but I got such a warning:

NAs introduced by coercion 

and additionally I spotted that number of NA values slightly increased after running this code (from 111165 NAs to 111557). How this warning can be solved in my case?

CodePudding user response:

You could fix it with a simple for loop.

First, convert to numeric, this will introduce NAs.

Then just iterate over the values and if they are higher than 1000 (or whatever you want) you subtract that value from 2022 giving age.

data <- data.frame(age = c("1", "10", "-", NA, "2019"))
data
   age
1    1
2   10
3    -
4 <NA>
5 2019

data$age <- as.numeric(data$age)
data
   age
1    1
2   10
3   NA
4   NA
5 2019

for (i in seq_along(data$age)) {
  if (is.na(data$age[i])) {
    data$age[i] <- data$age[i]
  } else if (data$age[i] > 1000) {
    data$age[i] <- 2022 - data$age[i]
  }
}

data
  age
1   1
2  10
3  NA
4  NA
5   3

CodePudding user response:

If you are going to use a logical index on the LHS of an assignment then you also need to make sure that the RHS is of the same length or is of length 1. One way to do that is to use the same logical index on both sides of the assignment operator.

dfrm <- data.frame(age=c("1750", 1800, "-", 130))
dfrm$age[ as.numeric(dfrm$age) %in% 1700:2000] <- 2022-as.numeric($age)[ 
#Warning messages:
#1: NAs introduced by coercion 
#2: In as.numeric(dfrm$age) %in% 1700:2000 : NAs introduced by coercion
#3: In as.numeric(dfrm$age) %in% 1700:2000 : NAs introduced by coercion                                                     as.numeric(dfrm$age) %in% 1700:2000]

> dfrm
  age
1 272
2 222
3   -
4 130

This will be much more efficient than a for-loop approach. You should nt be bother by the warning.

  • Related