I have a long list of CSV files, for examples they look like this:
names <- c("CHE1Q_S1001M1_20220615_025815_AM_Original.csv", "CHE2Q_S1002M1_20220615_030435_AM_Original.csv", "CHE6Q_S1053M2_20220615_033828_PM_Original.csv")
and I wish to batch shorten them into: "CHE1Q_S1001M1.csv", "CHE2Q_S1002M1.csv", "CHE6Q_S1053M2"
I have tried using the sub()
function like this:
sub('_.*', '', names)
but it only returns "CHE1Q" "CHE2Q" "CHE6Q".
Or:
sub('_.*\\_', '', names)
gave "CHE1QAverageSpectrum.csv" "CHE2QAverageSpectrum.csv" "CHE6QAverageSpectrum.csv"
I don't know how to make it ignores the first underscore but remove everything from the second underscore.
The best I can get is two steps:
names <- sub('_', '', names)
names <- sub('_.*', '', names)
and I can get the information but can't get the underscore in the middle: "CHE1QS1001M1" "CHE2QS1002M1" "CHE6QSB053M2"
CodePudding user response:
You can use a regex lookahead to identify strings before the second underscore.
Explanation:
^
starts of the string. ?
any number of characters_
followed by a underscore. ?
again any number of characters(?=_)
match one character before_
(this is your second underscore)(. ?_. ?(?=_))
put everything mentioned above into a capture group (note the bracket()
surrounding it).*
match any characters after the capture group till the end of the string\\1.csv
call back the strings in the capture group and add ".csv" after it
sub("^(. ?_. ?(?=_)).*", "\\1.csv", names, perl = T)
[1] "CHE1Q_S1001M1.csv" "CHE2Q_S1002M1.csv" "CHE6Q_S1053M2.csv"