Home > other >  replace and remove duplicate rows using ifelse
replace and remove duplicate rows using ifelse

Time:04-07

I have a data frame of postcodes with a regional/metro classification assigned. In some instances, due to the datasource, the same postcode will occur with both a regional and metro classification.

  POSTCODE   REGON  
1     3000    METRO       
2     3000    REGIONAL      
3     3256    METRO     
4     3145    METRO     

I am wondering how to remove the duplicate row and replace the region with "SPLIT" in these instances.

I have tried using the below code however this reassignes the entire dataset with either "METRO" or "REGIONAL"

test <- within(PC_ACTM, REGION <- ifelse(duplicated("Postcode"), "SPLIT", REGION))

The desired output would be

  POSTCODE   REGON  
1     3000    SPLIT
2     3256    METRO     
3     3145    METRO

Example data:

dput(PC_ACTM)
structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L), REGON = c("METRO", 
"REGIONAL", "METRO", "METRO")), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

CodePudding user response:

Based on your title, you're looking for an ifelse() solution; perhaps this will suit?

PC_ACTM <- structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L),
                          REGION = c("METRO", "REGIONAL", "METRO", "METRO")),
                     class = "data.frame",
                     row.names = c("1", "2", "3", "4"))

PC_ACTM$REGION <- ifelse(duplicated(PC_ACTM$POSTCODE), "SPLIT", PC_ACTM$REGION)
PC_ACTM[!duplicated(PC_ACTM$POSTCODE, fromLast = TRUE),]
#>   POSTCODE REGION
#> 2     3000  SPLIT
#> 3     3256  METRO
#> 4     3145  METRO

Created on 2022-04-07 by the reprex package (v2.0.1)

CodePudding user response:

Consider ave to sequential count by group and then subset the last but before use ifslse to replace needed value for any group counts over 1. Below uses new base R 4.1.0 pipe |>:

test <- within(
    PC_ACTM, {
        PC_SEQ <- ave(1:nrow(test), POSTCODE, FUN=seq_along)
        PC_COUNT <- ave(1:nrow(test), POSTCODE, FUN=length)
        REGION <- ifelse(
            (PC_SEQ == PC_COUNT) & (PC_COUNT > 1), "SPLIT", REGION
        )
    }
) |> subset(
    subset = PC_SEQ == PC_COUNT,   # SUBSET ROWS
    select = c(POSTCODE, REGION)   # SELECT COLUMNS
) |> `row.names<-`(NULL)           # RESET ROW NAMES
  • Related