Home > Software engineering >  R: only keep rows whose values differ from values in another column
R: only keep rows whose values differ from values in another column

Time:04-22

I want to only keep rows where the last two letters (the state abbreviation) in column 1 differ from the last two letters in column 3

  countyname            fipscounty   neighborname            fipsneighbor
1 Archuleta County, CO  8007         Rio Grande County, CO   8105
2 Archuleta County, CO  8007         Rio Arriba County, NM   35039
3 Archuleta County, CO  8007         San Juan County, NM     35045

In row 1, both counties are in Colorado. In rows 2 and 3, the first county is in CO, and the second county is in NM. I only want to keep rows 2 and 3 so that it looks like this:

  countyname            fipscounty   neighborname            fipsneighbor
2 Archuleta County, CO  8007         Rio Arriba County, NM   35039
3 Archuleta County, CO  8007         San Juan County, NM     35045

How can I do this?

CodePudding user response:

We can compare the last 2 characters in each column using str_sub, and only return the rows where the state abbreviations do not match.

library(tidyverse)

df %>% 
  filter(str_sub(countyname, start= -2) != (str_sub(neighborname, start= -2)))

Output

            countyname fipscounty          neighborname fipsneighbor
1 Archuleta County, CO       8007 Rio Arriba County, NM        35039
2 Archuleta County, CO       8007   San Juan County, NM        35045

Or in base R, we can subset to the last 2 characters in each column using sub, then filter the dataframe.

df[sub('.*(?=.{2}$)', '', df$countyname, perl=T) !=
     sub('.*(?=.{2}$)', '', df$neighborname, perl=T),]

Or another option using substr (though much more verbose):

df[substr(df$countyname, nchar(df$countyname)-1, nchar(df$countyname)) !=
substr(df$neighborname, nchar(df$neighborname)-1, nchar(df$neighborname)),]

Data

df <- structure(list(countyname = c("Archuleta County, CO", "Archuleta County, CO", 
"Archuleta County, CO"), fipscounty = c(8007L, 8007L, 8007L), 
    neighborname = c("Rio Grande County, CO", "Rio Arriba County, NM", 
    "San Juan County, NM"), fipsneighbor = c(8105L, 35039L, 35045L
    )), class = "data.frame", row.names = c(NA, -3L))
  • Related