I have a data frame that looks like this:
Twin_Pair zyg CDsumTwin1 CDsumTwin2
<chr> <int> <dbl> <dbl>
1 pair1(2891,2892) 2 0 5
2 pair2(4000,4001) 1 0 0
3 pair3(4006,4007) 2 0 3
4 pair4(4009,4010) 2 1 3
5 pair5(4012,4013) 2 2 0
6 pair6(4015,4016) 2 0 9
7 pair7(4018,4019) 2 0 0
8 pair8(4021,4022) 1 0 0
9 pair9(4024,4025) 1 0 0
10 pair10(4027,4028) 2 2 17
How can I remove "pair1", "pair2", etc. from each row in the first column such that I am left with something like (4027,4028)? I know how to remove the first 5 characters, but the problem is goes up to pair100. What would be an efficient way to do this?
CodePudding user response:
You need a regex call to identify your pattern. Please test this code to see if it works.
dat$Twin_Pair <- sub("^pair[0-9] ", "", dat$Twin_Pair)
dat
# Twin_Pair zyg CDsumTwin1 CDsumTwin2
# 1 (2891,2892) 2 0 5
# 2 (4000,4001) 1 0 0
# 3 (4006,4007) 2 0 3
# 4 (4009,4010) 2 1 3
# 5 (4012,4013) 2 2 0
# 6 (4015,4016) 2 0 9
# 7 (4018,4019) 2 0 0
# 8 (4021,4022) 1 0 0
# 9 (4024,4025) 1 0 0
# 10 (4027,4028) 2 2 17
Data
dat <- read.table(text = "Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 'pair1(2891,2892)' 2 0 5
2 'pair2(4000,4001)' 1 0 0
3 'pair3(4006,4007)' 2 0 3
4 'pair4(4009,4010)' 2 1 3
5 'pair5(4012,4013)' 2 2 0
6 'pair6(4015,4016)' 2 0 9
7 'pair7(4018,4019)' 2 0 0
8 'pair8(4021,4022)' 1 0 0
9 'pair9(4024,4025)' 1 0 0
10 'pair10(4027,4028)' 2 2 17",
header = TRUE)
CodePudding user response:
An option with trimws
dat$Twin_Pair <- trimws(dat$Twin_Pair, whitespace = "[^(] ", which = 'left')
-output
> dat
Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 (2891,2892) 2 0 5
2 (4000,4001) 1 0 0
3 (4006,4007) 2 0 3
4 (4009,4010) 2 1 3
5 (4012,4013) 2 2 0
6 (4015,4016) 2 0 9
7 (4018,4019) 2 0 0
8 (4021,4022) 1 0 0
9 (4024,4025) 1 0 0
10 (4027,4028) 2 2 17
CodePudding user response:
We could use str_extract
with regex '\(.*?\)', that basically extracts everything between parenthesis:
library(stringr)
library(dplyr)
dat %>%
mutate(Twin_Pair = str_extract(Twin_Pair, '\\(.*?\\)'))
Twin_Pair zyg CDsumTwin1 CDsumTwin2
1 (2891,2892) 2 0 5
2 (4000,4001) 1 0 0
3 (4006,4007) 2 0 3
4 (4009,4010) 2 1 3
5 (4012,4013) 2 2 0
6 (4015,4016) 2 0 9
7 (4018,4019) 2 0 0
8 (4021,4022) 1 0 0
9 (4024,4025) 1 0 0
10 (4027,4028) 2 2 17