Home > Back-end >  Change underscore behind word within column in R
Change underscore behind word within column in R

Time:08-05

Hi I have a data frame like this, with two columns (A and B):

 A       B
x_1234 rs4566
x_1567 rs3566
z_1444 rs78654
r_1234 rs34567

I would like to change each letter in front of the numbers in column A after the number, also with a underscore.

Expected output:

 A       B
1234_x rs4566
1567_x rs3566
1444_z rs78654
1234_r rs34567

I tried something like, but it doesn't work:

DF$A <- gsub(".*_", "_*.", DF$A)

CodePudding user response:

We may need to switch the characters after capturing as a group ((.*)- captures characters before the _ and the second capture group as one or more digits (\\d ), then switch those in the replacement with the backreferences (\\2 followed by \\1 separated by a _)

DF$A <- sub("(.*)_(\\d )", "\\2_\\1", DF$A)

-output

> DF
       A       B
1 1234_x  rs4566
2 1567_x  rs3566
3 1444_z rs78654
4 1234_r rs34567

The OP's code matches any characters (.*) followed by the _ and replace with the _ and literal characters (*.). Instead, the replacement should be based on the capture group backreferences

data

DF <- structure(list(A = c("x_1234", "x_1567", "z_1444", "r_1234"), 
    B = c("rs4566", "rs3566", "rs78654", "rs34567")),
 class = "data.frame", row.names = c(NA, 
-4L))
  •  Tags:  
  • r
  • Related