Hi I have a data frame like this, with two columns (A and B):
A B
x_1234 rs4566
x_1567 rs3566
z_1444 rs78654
r_1234 rs34567
I would like to change each letter in front of the numbers in column A
after the number, also with a underscore.
Expected output:
A B
1234_x rs4566
1567_x rs3566
1444_z rs78654
1234_r rs34567
I tried something like, but it doesn't work:
DF$A <- gsub(".*_", "_*.", DF$A)
CodePudding user response:
We may need to switch the characters after capturing as a group ((.*)
- captures characters before the _
and the second capture group as one or more digits (\\d
), then switch those in the replacement with the backreferences (\\2
followed by \\1
separated by a _
)
DF$A <- sub("(.*)_(\\d )", "\\2_\\1", DF$A)
-output
> DF
A B
1 1234_x rs4566
2 1567_x rs3566
3 1444_z rs78654
4 1234_r rs34567
The OP's code matches any characters (.*
) followed by the _
and replace with the _
and literal characters (*.
). Instead, the replacement should be based on the capture group backreferences
data
DF <- structure(list(A = c("x_1234", "x_1567", "z_1444", "r_1234"),
B = c("rs4566", "rs3566", "rs78654", "rs34567")),
class = "data.frame", row.names = c(NA,
-4L))