replace and remove duplicate rows using ifelse-CodePudding

I have a data frame of postcodes with a regional/metro classification assigned. In some instances, due to the datasource, the same postcode will occur with both a regional and metro classification.

  POSTCODE   REGON  
1     3000    METRO       
2     3000    REGIONAL      
3     3256    METRO     
4     3145    METRO

I am wondering how to remove the duplicate row and replace the region with "SPLIT" in these instances.

I have tried using the below code however this reassignes the entire dataset with either "METRO" or "REGIONAL"

test <- within(PC_ACTM, REGION <- ifelse(duplicated("Postcode"), "SPLIT", REGION))

The desired output would be

  POSTCODE   REGON  
1     3000    SPLIT
2     3256    METRO     
3     3145    METRO

Example data:

dput(PC_ACTM)
structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L), REGON = c("METRO", 
"REGIONAL", "METRO", "METRO")), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

CodePudding user response：

Based on your title, you're looking for an ifelse() solution; perhaps this will suit?

PC_ACTM <- structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L),
                          REGION = c("METRO", "REGIONAL", "METRO", "METRO")),
                     class = "data.frame",
                     row.names = c("1", "2", "3", "4"))

PC_ACTM$REGION <- ifelse(duplicated(PC_ACTM$POSTCODE), "SPLIT", PC_ACTM$REGION)
PC_ACTM[!duplicated(PC_ACTM$POSTCODE, fromLast = TRUE),]
#>   POSTCODE REGION
#> 2     3000  SPLIT
#> 3     3256  METRO
#> 4     3145  METRO

^{Created on 2022-04-07 by the reprex package (v2.0.1)}

CodePudding user response：

Consider ave to sequential count by group and then subset the last but before use ifslse to replace needed value for any group counts over 1. Below uses new base R 4.1.0 pipe |>:

test <- within(
    PC_ACTM, {
        PC_SEQ <- ave(1:nrow(test), POSTCODE, FUN=seq_along)
        PC_COUNT <- ave(1:nrow(test), POSTCODE, FUN=length)
        REGION <- ifelse(
            (PC_SEQ == PC_COUNT) & (PC_COUNT > 1), "SPLIT", REGION
        )
    }
) |> subset(
    subset = PC_SEQ == PC_COUNT,   # SUBSET ROWS
    select = c(POSTCODE, REGION)   # SELECT COLUMNS
) |> `row.names<-`(NULL)           # RESET ROW NAMES