Error in stri_replace_first_regex while using str

I am stuck in this step:

Replace all the T values with "0.0" and
Remove "s" from values like "0.02s".

The column presents some irregular formats like:

0.00 **T** 0.06 <NA> 0.03 0.02 0.08 0.01 0.07 0.16 0.09 0.22 0.02**s** 0.24 0.18 0.05 0.04 0.09**s** 0.11 0.14 0.25 0.10 0.01s 0.58 0.12 0.13 0.46 1.07 1.19 0.34 0.20 0.36**s** 0.42 0.17 0.27 0.35 0.31 0.33 0.23 0.26 0.28 0.75 0.19 0.36 0.03s 0.07s 0.54 0.59 0.21

When the desire output should be:

0.00 0.00 0.06 0.00 0.03 0.02 0.08 0.01 0.07 0.16 0.09 0.22 0.02 0.24 0.18 0.05 0.04 0.09s 0.11 0.14 0.25 0.10 0.01 0.58 0.12 0.13 0.46 1.07 1.19 0.34 0.20 0.36 0.42 0.17 0.27 0.35 0.31 0.33 0.23 0.26 0.28 0.75 0.19 0.36 0.03 0.07 0.54 0.59 0.21

For question number 1, I don't have any idea.

For question number 2, I am using: str_remove(Col_name, pattern = "s$") - giving me back this: *

Error in stri_replace_first_regex while using str_remove

CodePudding user response：

You can do this in several ways, but for your purpose I'd suggest keeping it simple and use the functions from stringr as you're sugesting yourself.

library(stringr)

testdata <- c("T", "T", "0.02s", "0.02s", "0.03", "0.04")

testdata <- str_replace(testdata, "T", "0.0")
testdata <- str_remove(testdata, "s")

testdata <- as.numeric(testdata)

testdata

Output:

[1] 0.00 0.00 0.02 0.02 0.03 0.04

CodePudding user response：

You can do this, which removes the "s" and "*", replaces the "T" with "0", and then converts to numeric:

df %>% 
  mutate(new_col = as.numeric(str_replace(str_remove_all(col, "[*s]"), "T", "0")))

Output:

         col new_col
1       0.00    0.00
2      **T**    0.00
3       0.06    0.06
4       <NA>      NA
5       0.03    0.03
6       0.02    0.02
7       0.08    0.08
8       0.01    0.01
9       0.07    0.07
10      0.16    0.16
11      0.09    0.09
12      0.22    0.22
13 0.02**s**    0.02
14      0.24    0.24
15      0.18    0.18
16      0.05    0.05
17      0.04    0.04
18 0.09**s**    0.09
19      0.11    0.11
20      0.14    0.14
21      0.25    0.25
22      0.10    0.10
23     0.01s    0.01
24      0.58    0.58
25      0.12    0.12
26      0.13    0.13
27      0.46    0.46
28      1.07    1.07
29      1.19    1.19
30      0.34    0.34
31      0.20    0.20
32 0.36**s**    0.36
33      0.42    0.42
34      0.17    0.17
35      0.27    0.27
36      0.35    0.35
37      0.31    0.31
38      0.33    0.33
39      0.23    0.23
40      0.26    0.26
41      0.28    0.28
42      0.75    0.75
43      0.19    0.19
44      0.36    0.36
45     0.03s    0.03
46     0.07s    0.07
47      0.54    0.54
48      0.59    0.59
49      0.21    0.21

Input:

df = structure(list(col = c("0.00", "**T**", "0.06", "<NA>", "0.03", 
"0.02", "0.08", "0.01", "0.07", "0.16", "0.09", "0.22", "0.02**s**", 
"0.24", "0.18", "0.05", "0.04", "0.09**s**", "0.11", "0.14", 
"0.25", "0.10", "0.01s", "0.58", "0.12", "0.13", "0.46", "1.07", 
"1.19", "0.34", "0.20", "0.36**s**", "0.42", "0.17", "0.27", 
"0.35", "0.31", "0.33", "0.23", "0.26", "0.28", "0.75", "0.19", 
"0.36", "0.03s", "0.07s", "0.54", "0.59", "0.21")), class = "data.frame", row.names = c(NA, 
-49L))