please look at the following dataframe as a reproducible example:
df <- data.frame(Last_year = c('2013', '2020', '2017', '2015', '2016', '2021'),
year = c('2021', '2020', '2019', '2018', '2017', '2016'))
I want to compare the values in the columns and discard the row if the value is different & Last_year<year.
This is the code I come up with:
for(i in 1:nrow(df)){
if((df1$Last_year[i] != df1$year[i] && df1$Last_year[i] < df1$year[i]) |
is.na(df1$year[i]))
{df <- df[-i,]}
else
next}
I cannot understand why, this code does not eliminate all the last_year < year.. can you spot the reason?
The final dataframe I wish to obtain is:
df <- data.frame(Last_year = c('2020', '2021'),
year = c('2020', '2016'))
which correspond to the second and the last values, which are the one that satisfy my wish --> Last_year > year
CodePudding user response:
Similar to @Maël's contribution, it seems you want to keep rows which last_year is greater than year:
df[df$Last_year > df$year, ]
If you want to avoid NAs, you can use which()
:
df[which(df$Last_year < df$year), ]
CodePudding user response:
You don't need a for loop at all. You can just replace the for loop with a simple filter
statement from dplyr
or use base R as provided by the others.
library(dplyr)
df %>%
filter(Last_year >= year & !is.na(year))
Or use subset
from base R:
subset(df, Last_year >= year & !is.na(year))
Output
Last_year year
1 2020 2020
2 2021 2016
Data
df <- structure(list(Last_year = c("2013", "2020", "2017", "2015",
"2016", "2021", "2022"), year = c("2021", "2020", "2019", "2018",
"2017", "2016", "NA")), class = "data.frame", row.names = c(NA,
-7L))