I have a data
as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,205,285,43,95,75,76,NA),
Ht=c(185,182,173,171,600,650,NA,890,NA,NA))
Wt
represents the weight in kilograms and Ht
represents the height in centimeters. In this example, I want to treat the values of Wt
bigger than 200 as outliers and change to some specific numbers.
Also, I want to treat the values of Ht
bigger than 250 as outliers and change to NA
.
In my actual data
, there are few outliers in Wt
and many outliers in Ht
.
So, I could find the outliers for Wt
by using the code below:
a1<-data$Wt
a1<-data.frame(a1)
a1<-na.omit(a1)
b1<-a1[a1$a1>200, ]
b1 #205,285
I want to change 205 to 80 and change 285 to 90. (Because, in my actual data, there are few outliers for Wt
, so that I can change them individually.)
Also, I want to make the values of Ht
bigger than 250 as NA
. So my expected output is as follows:
data<-data.frame(id=c(1,2,3,4,5,6,7,8,9,10),
Wt=c(91,92,85,80,90,43,95,75,76,NA),
Ht=c(185,182,173,171,NA,NA,NA,NA,NA,NA))
CodePudding user response:
Do it by reference using data.table
:
library(data.table)
setDT(data)
data[Ht > 250, Ht := NA]
data[Wt == 205, Wt := 80]
data[Wt == 285, Wt := 90]
data
id Wt Ht
1: 1 91 185
2: 2 92 182
3: 3 85 173
4: 4 80 171
5: 5 90 NA
6: 6 43 NA
7: 7 95 NA
8: 8 75 NA
9: 9 76 NA
10: 10 NA NA
For more info, see: Introduction to data.table
.