While I was transforming data from a dataframe in R (Rstudio), I wanted to give NA values to a specified column if the number is in a list. This list (I believe it is a list), comes from a boxplot.stats(x)$out.
So this is what I did to get a variable with a list of the numbers from the boxplot:
age_outofrange <- boxplot.stats(census$age)$out
And this is what I coded. I used the unique(x) method because some ages where repeated:
census["age"][census["age"] == unique(age_outofrange), ] <- NA
census -> Dataframe
age -> The target column
This is an example of my current dataframe:
index|age
1|34
2|79
3|80
4|23
5|650
6|44
7|560
8|12
9|65
10|79
This is what I am expecting (I write a new csv and nothing happens):
index|age
1|34
2|NA
3|NA
4|23
5|NA
6|44
7|NA
8|12
9|65
10|NA
So I substituted the values: 79, 80, 650, and 560, which are the values from age_outofrange. I also tried something like the following code but nothing happened (or at least what the csv showed me). A few values were changed but the vast majority didn't:
df <- df$column[-listvalue, ]
Do you know how to code it right? Thank you for your answers!
CodePudding user response:
We may need to use [[
to extract the column as a vector
. In addition, ==
can be replaced with %in%
if the length
of unique
elements in 'age_outofrange' is more than 1
census[["age"]][census[["age"]] %in% unique(age_outofrange)] <- NA
-output
> census
index age
1 1 34
2 2 NA
3 3 NA
4 4 23
5 5 NA
6 6 44
7 7 NA
8 8 12
9 9 65
10 10 NA
data
census <- structure(list(index = 1:10, age = c(34L, 79L, 80L, 23L, 650L,
44L, 560L, 12L, 65L, 79L)), class = "data.frame", row.names = c(NA,
-10L))
age_outofrange <- c(79, 80, 650, 560)