I have this data
col1 col2
#1 2012 a
#2 1995 b
#3 1998 a
#4 2001 d
#5 2021 c
#6 2021 a
#7 NA b
#8 NA d
I want to remove the rows with 2021.
First there's this one:
df <- subset(df, col1 != 2021)
Problem: also removes #7 and #8
Then there's this one:
df <- filter(df, col1 != 2021)
Problem: Gives error: Error in initialize(...) : attempt to use zero-length variable name
Then there's this one:
df <- df[df$col1 != 2021, ]
Problem: creates this result:
col1 col2
#1 2012 a
#2 1995 b
#3 1998 a
#4 2001 d
#NA NA NA
#NA NA NA
#NA NA NA
#NA NA NA
My goal is to get this:
col1 col2
#1 2012 a
#2 1995 b
#3 1998 a
#4 2001 d
#5 NA b
#6 NA d
CodePudding user response:
It seems like you are trying to filter out a certain year but want to keep the rows where the year is missing. Try this.
df[is.na(df$col1) | df$col1 != 2021, ]
CodePudding user response:
A dplyr approach
df <- df %>% filter((col1 != 2021) %>% replace_na(TRUE))
Output
> df
num col1 col2
1 #1 2012 a
2 #2 1995 b
3 #3 1998 a
4 #4 2001 d
5 #7 NA b
6 #8 NA d
CodePudding user response:
Try
df = data.frame(col1 = c(2010:2022,NA, NA), col2 = c(NA, NA, rnorm(13)))
col1 col2
1 2010 NA
2 2011 NA
3 2012 0.4247744
4 2013 -1.6378778
5 2014 -0.9633402
6 2015 1.0030133
7 2016 0.1063912
8 2017 2.2983095
9 2018 -1.0941622
10 2019 0.3604223
11 2020 0.9171499
12 2021 1.3803499
13 2022 -0.5693971
14 NA 1.1911385
15 NA 0.4741301
EDIT
# Proposed fix:
df[-which(df$col1 == 2021),]
col1 col2
1 2010 NA
2 2011 NA
3 2012 0.4247744
4 2013 -1.6378778
5 2014 -0.9633402
6 2015 1.0030133
7 2016 0.1063912
8 2017 2.2983095
9 2018 -1.0941622
10 2019 0.3604223
11 2020 0.9171499
13 2022 -0.5693971
14 NA 1.1911385
15 NA 0.4741301
Only the 2021 obs in col1 are removed, preserving NAs in all cols.