Home > other >  Change and delete the outliers in R by specific conditions
Change and delete the outliers in R by specific conditions

Time:09-03

I have a data as follows:

data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
                 Wt=c(91,92,85,205,285,43,95,75,76,NA),
                 Ht=c(185,182,173,171,600,650,NA,890,NA,NA))

Wt represents the weight in kilograms and Ht represents the height in centimeters. In this example, I want to treat the values of Wt bigger than 200 as outliers and change to some specific numbers. Also, I want to treat the values of Ht bigger than 250 as outliers and change to NA. In my actual data, there are few outliers in Wt and many outliers in Ht. So, I could find the outliers for Wt by using the code below:

a1<-data$Wt 

a1<-data.frame(a1)
a1<-na.omit(a1)
b1<-a1[a1$a1>200, ]
b1  #205,285

I want to change 205 to 80 and change 285 to 90. (Because, in my actual data, there are few outliers for Wt, so that I can change them individually.) Also, I want to make the values of Ht bigger than 250 as NA. So my expected output is as follows:

data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
                 Wt=c(91,92,85,80,90,43,95,75,76,NA),
                 Ht=c(185,182,173,171,NA,NA,NA,NA,NA,NA))

CodePudding user response:

Do it by reference using data.table:

library(data.table)
setDT(data)

data[Ht > 250, Ht := NA]
data[Wt == 205, Wt := 80]
data[Wt == 285, Wt := 90]
data
    id Wt  Ht
 1:  1 91 185
 2:  2 92 182
 3:  3 85 173
 4:  4 80 171
 5:  5 90  NA
 6:  6 43  NA
 7:  7 95  NA
 8:  8 75  NA
 9:  9 76  NA
10: 10 NA  NA

For more info, see: Introduction to data.table.

  • Related