I need to know how to remove all characters from a value after the first D letter and 1st number or 2 second number. I am not sure how to start.
I have a data frame and I have a column of type Character
- The column is called " Eircode "
The postal codes go from D01 to D24 ( these are Dublin postal codes )
The values are inputted like so What you see in red is what needs to be removed.
I need to be able to remove the characters after the last digit.
My dataframe is called "MainSchools"
So if the " Eircode " is D03P820
, I need to have it as D03
after my change.
I would preferably like to be able to do this with the Tidyverse
package if possible.
CodePudding user response:
You may use sub
here:
df <- data.frame(Eircode=c("D15P820", "K78YD27", "D03P820"),
stringsAsFactors=FALSE)
df$Eircode <- sub("^(D(?:0[1-9]|1[0-9]|2[0-4])).*$", "\\1", df$Eircode)
df
Eircode
1 D15
2 K78YD27
3 D03
The regex pattern used above matches and captures Dublin postal codes as follows:
D match D
(?:
0[1-9] followed by 0-9
| OR
1[0-9] 10-19
| OR
2[0-4] 20-24
)
Then, we use \1
as the replacement in sub
, leaving behind only the 3 character Dublin postal code.
CodePudding user response:
I like to use the stringr package for such operations.
library(dplyr)
library(sitrngr)
df %>% mutate(Eircode = str_extract_all(Eircode, '^[A-Z][0-9]{2}'))
output with the data from @Tim Biegeleisen:
Eircode
1 D15
2 K78
3 D03