I have a column in my dataset with a bunch of numbers that are either five or six numbers long like below. However, all these need to be six numbers long. The dataset I have is from combining multiple excels together and the lead "0" in the five number long numbers disappeared in some of the excel but not in other, so I need to readd it to the five number long ones. So I need a function that finds any number that is five numbers long and then adds a 0 at the beginning so "49302" turns into 049302", but leaves the six number long observations as is. Any thoughts on how using gsub or another function to do this on a large dataset?
49302 038503 594030 39845 503932 334922 49403 84034
CodePudding user response:
x <- c(49302,038503,594030,39845,503932,334922,49403,84034)
df <-
tibble(x = x)
library(dplyr)
df %>%
mutate(x = if_else(nchar(x)==5,paste0(0,x),as.character(x)))
# A tibble: 8 x 1
x
<chr>
1 049302
2 038503
3 594030
4 039845
5 503932
6 334922
7 049403
8 084034
CodePudding user response:
We can use sprintf
df1$col1 <- sprintf('d', as.numeric(df1$col1))
-output
> df1
col1
1 049302
2 038503
3 594030
4 039845
5 503932
6 334922
7 049403
8 084034
Or if we need a regex, capture the ones having 5 digits from the start (^
) till the end ($
) and replace with 0 and its backreference (\\1
)
df1$col1 <- sub("^(.....)$", "0\\1", df1$col1)
df1$col1
[1] "049302" "038503" "594030" "039845" "503932" "334922" "049403" "084034"
data
df1 <- data.frame(col1 = c(49302, 38503, 594030, 39845, 503932, 334922, 49403, 84034))