Home > OS >  compare values in columns R
compare values in columns R

Time:04-26

please look at the following dataframe as a reproducible example:

df <- data.frame(Last_year = c('2013', '2020', '2017', '2015', '2016', '2021'), 
year = c('2021', '2020', '2019', '2018', '2017', '2016'))

I want to compare the values in the columns and discard the row if the value is different & Last_year<year.

This is the code I come up with:

for(i in 1:nrow(df)){
    if((df1$Last_year[i] != df1$year[i] && df1$Last_year[i] < df1$year[i]) | 
         is.na(df1$year[i]))
         {df <- df[-i,]}
    else 
         next}

I cannot understand why, this code does not eliminate all the last_year < year.. can you spot the reason?

The final dataframe I wish to obtain is:

df <- data.frame(Last_year = c('2020', '2021'), 
year = c('2020', '2016'))

which correspond to the second and the last values, which are the one that satisfy my wish --> Last_year > year

CodePudding user response:

Similar to @Maël's contribution, it seems you want to keep rows which last_year is greater than year:

df[df$Last_year > df$year, ]

If you want to avoid NAs, you can use which():

df[which(df$Last_year < df$year), ]

CodePudding user response:

You don't need a for loop at all. You can just replace the for loop with a simple filter statement from dplyr or use base R as provided by the others.

library(dplyr)
    
df %>%
  filter(Last_year >= year  & !is.na(year))

Or use subset from base R:

subset(df, Last_year >= year  & !is.na(year))

Output

  Last_year year
1      2020 2020
2      2021 2016

Data

df <- structure(list(Last_year = c("2013", "2020", "2017", "2015", 
"2016", "2021", "2022"), year = c("2021", "2020", "2019", "2018", 
"2017", "2016", "NA")), class = "data.frame", row.names = c(NA, 
-7L))
  • Related