Home > Back-end >  Formatting UK Postcodes in R
Formatting UK Postcodes in R

Time:10-06

I am trying to format UK postcodes that come in as a vector of different input in R.

For example, I have the following postcodes:

postcodes<-c("IV41 8PW","IV408BU","kY11..4hJ","KY1.1UU","KY4    9RW","G32-7EJ")

How do I write a generic code that would convert entries of the above vector into:

c("IV41 8PW","IV40 8BU","KY11 4HJ","KY1 1UU","KY4 9RW","G32 7EJ")

That is the first part of the postcode is separated from the second part of the postcode by one space and all letters are capitals.

EDIT: the second part of the postcode is always the 3 last characters (combination of a number followed by letters)

CodePudding user response:

I couldn't come up with a smart regex solution so here is a split-apply-combine approach.

sapply(strsplit(sub('^(.*?)(...)$', '\\1:\\2', postcodes), ':', fixed = TRUE), function(x) {
  paste0(toupper(trimws(x, whitespace = '[.\\s-]')), collapse = ' ')
})

#[1] "IV41 8PW" "IV40 8BU" "KY11 4HJ" "KY1 1UU"  "KY4 9RW"  "G32 7EJ" 

The logic here is that we insert a : (or any character that is not in the data) in the string between the 1st and 2nd part. Split the string on :, remove unnecessary characters, get it in upper case and combine it in one string.

CodePudding user response:

One approach:

  1. Convert to uppercase

  2. extract the alphanumeric characters

  3. Paste back together with a space before the last three characters

The code would then be:

library(stringr)

postcodes<-c("IV41 8PW","IV408BU","kY11..4hJ","KY1.1UU","KY4    9RW","G32-7EJ")

postcodes <- str_to_upper(postcodes)
sapply(str_extract_all(postcodes, "[:alnum:]"), function(x)paste(paste0(head(x,-3), collapse = ""), paste0(tail(x,3), collapse = "")))
# [1] "IV41 8PW" "IV40 8BU" "KY11 4HJ" "KY1 1UU"  "KY4 9RW"  "G32 7EJ"
  • Related