I am stuck in this step:
- Replace all the
T
values with "0.0" and - Remove "s" from values like "0.02s".
The column presents some irregular formats like:
0.00 **T** 0.06 <NA> 0.03 0.02 0.08 0.01 0.07 0.16 0.09 0.22 0.02**s** 0.24 0.18 0.05 0.04 0.09**s** 0.11 0.14 0.25 0.10 0.01s 0.58 0.12 0.13 0.46 1.07 1.19 0.34 0.20 0.36**s** 0.42 0.17 0.27 0.35 0.31 0.33 0.23 0.26 0.28 0.75 0.19 0.36 0.03s 0.07s 0.54 0.59 0.21
When the desire output should be:
0.00 0.00 0.06 0.00 0.03 0.02 0.08 0.01 0.07 0.16 0.09 0.22 0.02 0.24 0.18 0.05 0.04 0.09s 0.11 0.14 0.25 0.10 0.01 0.58 0.12 0.13 0.46 1.07 1.19 0.34 0.20 0.36 0.42 0.17 0.27 0.35 0.31 0.33 0.23 0.26 0.28 0.75 0.19 0.36 0.03 0.07 0.54 0.59 0.21
For question number 1, I don't have any idea.
For question number 2, I am using: str_remove(Col_name, pattern = "s$")
- giving me back this: *
Error in stri_replace_first_regex while using str_remove
CodePudding user response:
You can do this in several ways, but for your purpose I'd suggest keeping it simple and use the functions from stringr
as you're sugesting yourself.
library(stringr)
testdata <- c("T", "T", "0.02s", "0.02s", "0.03", "0.04")
testdata <- str_replace(testdata, "T", "0.0")
testdata <- str_remove(testdata, "s")
testdata <- as.numeric(testdata)
testdata
Output:
[1] 0.00 0.00 0.02 0.02 0.03 0.04
CodePudding user response:
You can do this, which removes the "s" and "*", replaces the "T" with "0", and then converts to numeric:
df %>%
mutate(new_col = as.numeric(str_replace(str_remove_all(col, "[*s]"), "T", "0")))
Output:
col new_col
1 0.00 0.00
2 **T** 0.00
3 0.06 0.06
4 <NA> NA
5 0.03 0.03
6 0.02 0.02
7 0.08 0.08
8 0.01 0.01
9 0.07 0.07
10 0.16 0.16
11 0.09 0.09
12 0.22 0.22
13 0.02**s** 0.02
14 0.24 0.24
15 0.18 0.18
16 0.05 0.05
17 0.04 0.04
18 0.09**s** 0.09
19 0.11 0.11
20 0.14 0.14
21 0.25 0.25
22 0.10 0.10
23 0.01s 0.01
24 0.58 0.58
25 0.12 0.12
26 0.13 0.13
27 0.46 0.46
28 1.07 1.07
29 1.19 1.19
30 0.34 0.34
31 0.20 0.20
32 0.36**s** 0.36
33 0.42 0.42
34 0.17 0.17
35 0.27 0.27
36 0.35 0.35
37 0.31 0.31
38 0.33 0.33
39 0.23 0.23
40 0.26 0.26
41 0.28 0.28
42 0.75 0.75
43 0.19 0.19
44 0.36 0.36
45 0.03s 0.03
46 0.07s 0.07
47 0.54 0.54
48 0.59 0.59
49 0.21 0.21
Input:
df = structure(list(col = c("0.00", "**T**", "0.06", "<NA>", "0.03",
"0.02", "0.08", "0.01", "0.07", "0.16", "0.09", "0.22", "0.02**s**",
"0.24", "0.18", "0.05", "0.04", "0.09**s**", "0.11", "0.14",
"0.25", "0.10", "0.01s", "0.58", "0.12", "0.13", "0.46", "1.07",
"1.19", "0.34", "0.20", "0.36**s**", "0.42", "0.17", "0.27",
"0.35", "0.31", "0.33", "0.23", "0.26", "0.28", "0.75", "0.19",
"0.36", "0.03s", "0.07s", "0.54", "0.59", "0.21")), class = "data.frame", row.names = c(NA,
-49L))