How to remove data after certain characters-CodePudding

I need to know how to remove all characters from a value after the first D letter and 1st number or 2 second number. I am not sure how to start.

I have a data frame and I have a column of type Character

The column is called " Eircode "

The postal codes go from D01 to D24 ( these are Dublin postal codes )

The values are inputted like so What you see in red is what needs to be removed.

I need to be able to remove the characters after the last digit.

My dataframe is called "MainSchools"

So if the " Eircode " is D03P820, I need to have it as D03 after my change.

I would preferably like to be able to do this with the Tidyverse package if possible.

CodePudding user response：

You may use sub here:

df <- data.frame(Eircode=c("D15P820", "K78YD27", "D03P820"),
                 stringsAsFactors=FALSE)
df$Eircode <- sub("^(D(?:0[1-9]|1[0-9]|2[0-4])).*$", "\\1", df$Eircode)
df

  Eircode
1     D15
2 K78YD27
3     D03

The regex pattern used above matches and captures Dublin postal codes as follows:

D           match D
(?:
    0[1-9]  followed by 0-9
    |       OR
    1[0-9]  10-19
    |       OR
    2[0-4]  20-24
)

Then, we use \1 as the replacement in sub, leaving behind only the 3 character Dublin postal code.

CodePudding user response：

I like to use the stringr package for such operations.

library(dplyr)
library(sitrngr)

df %>% mutate(Eircode = str_extract_all(Eircode, '^[A-Z][0-9]{2}'))

output with the data from @Tim Biegeleisen:

  Eircode
1     D15
2     K78
3     D03