How to truncate/modify filenames in batches in R?-CodePudding

I have a long list of CSV files, for examples they look like this:

names <- c("CHE1Q_S1001M1_20220615_025815_AM_Original.csv", "CHE2Q_S1002M1_20220615_030435_AM_Original.csv", "CHE6Q_S1053M2_20220615_033828_PM_Original.csv")

and I wish to batch shorten them into: "CHE1Q_S1001M1.csv", "CHE2Q_S1002M1.csv", "CHE6Q_S1053M2"

I have tried using the sub() function like this:

sub('_.*', '', names)

but it only returns "CHE1Q" "CHE2Q" "CHE6Q".

Or:

sub('_.*\\_', '', names)

gave "CHE1QAverageSpectrum.csv" "CHE2QAverageSpectrum.csv" "CHE6QAverageSpectrum.csv"

I don't know how to make it ignores the first underscore but remove everything from the second underscore.

The best I can get is two steps:

names <- sub('_', '', names)
names <- sub('_.*', '', names)

and I can get the information but can't get the underscore in the middle: "CHE1QS1001M1" "CHE2QS1002M1" "CHE6QSB053M2"

CodePudding user response：

You can use a regex lookahead to identify strings before the second underscore.

Explanation:

^ starts of the string
. ? any number of characters
_ followed by a underscore
. ? again any number of characters
(?=_) match one character before _ (this is your second underscore)
(. ?_. ?(?=_)) put everything mentioned above into a capture group (note the bracket () surrounding it)
.* match any characters after the capture group till the end of the string
\\1.csv call back the strings in the capture group and add ".csv" after it

sub("^(. ?_. ?(?=_)).*", "\\1.csv", names, perl = T)
[1] "CHE1Q_S1001M1.csv" "CHE2Q_S1002M1.csv" "CHE6Q_S1053M2.csv"