I have a data frame of postcodes with a regional/metro classification assigned. In some instances, due to the datasource, the same postcode will occur with both a regional and metro classification.
POSTCODE REGON
1 3000 METRO
2 3000 REGIONAL
3 3256 METRO
4 3145 METRO
I am wondering how to remove the duplicate row and replace the region with "SPLIT" in these instances.
I have tried using the below code however this reassignes the entire dataset with either "METRO" or "REGIONAL"
test <- within(PC_ACTM, REGION <- ifelse(duplicated("Postcode"), "SPLIT", REGION))
The desired output would be
POSTCODE REGON
1 3000 SPLIT
2 3256 METRO
3 3145 METRO
Example data:
dput(PC_ACTM)
structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L), REGON = c("METRO",
"REGIONAL", "METRO", "METRO")), class = "data.frame", row.names = c("1",
"2", "3", "4"))
CodePudding user response:
Based on your title, you're looking for an ifelse()
solution; perhaps this will suit?
PC_ACTM <- structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L),
REGION = c("METRO", "REGIONAL", "METRO", "METRO")),
class = "data.frame",
row.names = c("1", "2", "3", "4"))
PC_ACTM$REGION <- ifelse(duplicated(PC_ACTM$POSTCODE), "SPLIT", PC_ACTM$REGION)
PC_ACTM[!duplicated(PC_ACTM$POSTCODE, fromLast = TRUE),]
#> POSTCODE REGION
#> 2 3000 SPLIT
#> 3 3256 METRO
#> 4 3145 METRO
Created on 2022-04-07 by the reprex package (v2.0.1)
CodePudding user response:
Consider ave
to sequential count by group and then subset
the last but before use ifslse
to replace needed value for any group counts over 1. Below uses new base R 4.1.0 pipe |>
:
test <- within(
PC_ACTM, {
PC_SEQ <- ave(1:nrow(test), POSTCODE, FUN=seq_along)
PC_COUNT <- ave(1:nrow(test), POSTCODE, FUN=length)
REGION <- ifelse(
(PC_SEQ == PC_COUNT) & (PC_COUNT > 1), "SPLIT", REGION
)
}
) |> subset(
subset = PC_SEQ == PC_COUNT, # SUBSET ROWS
select = c(POSTCODE, REGION) # SELECT COLUMNS
) |> `row.names<-`(NULL) # RESET ROW NAMES