I would like to standardize all those _xxxxxx character strings to the xxxxxxH format in V1 column.
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
_122228 23 Test kits
_122229 57 Test kits
_122231 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits
I tried to solve it with gsub function in R but couldn't make the exact format that I need. Any kind of suggestions, please!!! Unix based commands are also helpful.
df <- gsub("_","H",c(file$V1))
Outputs;
"H1222228" "H1222229" "H1222231"
Desired outputs;
V1 V2 V3
122223H 20 Test kits
122224H 23 Test kits
122225H 42 Test kits
122227H 31 Test kits
122228H 23 Test kits
122229H 57 Test kits
122231H 21 Test kits
122232H 33 Test kits
122234H 22 Test kits
....... .. .... ....
....... .. .... ....
....... .. .... ....
122250H 33 Test kits
CodePudding user response:
Just replace the number with the number followed by an H in those cases where the string begins with an underscore:
file <- data.frame(v1 = c("122227H", "_122231"))
file$v1 <- gsub("_(\\d. )", "\\1H", file$v1)
Output:
"122227H" "122231H"
CodePudding user response:
Try the following, though more elegant solutions may exist:
df <- data.frame(v1 = c("122223H","122224H","122225H","122227H","_122228","_122229"),
v2 = c(21,23,42,31,23,57),
v3 = rep("Test Kits", times = 6))
df$newstring <- gsub("_","",c(df$v1))
df$newstring <- ifelse(grepl("H", df$newstring, fixed = TRUE), df$newstring, paste0(df$newstring,"H"))
# > df
# v1 v2 v3 newstring
# 1 122223H 21 Test Kits 122223H
# 2 122224H 23 Test Kits 122224H
# 3 122225H 42 Test Kits 122225H
# 4 122227H 31 Test Kits 122227H
# 5 _122228 23 Test Kits 122228H
# 6 _122229 57 Test Kits 122229H