Home > front end >  How to solve this problem of removing rows based on a condition when there are NA values
How to solve this problem of removing rows based on a condition when there are NA values

Time:03-29

I have this data

     col1   col2
#1   2012   a
#2   1995   b
#3   1998   a
#4   2001   d
#5   2021   c
#6   2021   a
#7   NA     b
#8   NA     d

I want to remove the rows with 2021.

First there's this one:

df <- subset(df, col1 != 2021)

Problem: also removes #7 and #8

Then there's this one:

df <- filter(df, col1 != 2021)

Problem: Gives error: Error in initialize(...) : attempt to use zero-length variable name

Then there's this one:

df <- df[df$col1 != 2021, ] 

Problem: creates this result:

     col1   col2
#1   2012   a
#2   1995   b
#3   1998   a
#4   2001   d
#NA  NA     NA
#NA  NA     NA
#NA  NA     NA
#NA  NA     NA

My goal is to get this:

     col1   col2
#1   2012   a
#2   1995   b
#3   1998   a
#4   2001   d
#5   NA     b
#6   NA     d 

CodePudding user response:

It seems like you are trying to filter out a certain year but want to keep the rows where the year is missing. Try this.

df[is.na(df$col1) | df$col1 != 2021, ] 

CodePudding user response:

A dplyr approach

df <- df %>% filter((col1 != 2021) %>% replace_na(TRUE))

Output

> df
  num col1 col2
1  #1 2012    a
2  #2 1995    b
3  #3 1998    a
4  #4 2001    d
5  #7   NA    b
6  #8   NA    d

CodePudding user response:

Try

df = data.frame(col1 = c(2010:2022,NA, NA), col2 = c(NA, NA, rnorm(13)))

   col1       col2
1  2010         NA
2  2011         NA
3  2012  0.4247744
4  2013 -1.6378778
5  2014 -0.9633402
6  2015  1.0030133
7  2016  0.1063912
8  2017  2.2983095
9  2018 -1.0941622
10 2019  0.3604223
11 2020  0.9171499
12 2021  1.3803499
13 2022 -0.5693971
14   NA  1.1911385
15   NA  0.4741301

EDIT


# Proposed fix: 
df[-which(df$col1 == 2021),]

   col1       col2
1  2010         NA
2  2011         NA
3  2012  0.4247744
4  2013 -1.6378778
5  2014 -0.9633402
6  2015  1.0030133
7  2016  0.1063912
8  2017  2.2983095
9  2018 -1.0941622
10 2019  0.3604223
11 2020  0.9171499
13 2022 -0.5693971
14   NA  1.1911385
15   NA  0.4741301

Only the 2021 obs in col1 are removed, preserving NAs in all cols.

  • Related