I'm trying to create a new column which just contains certain numeric data from an expression.
Here's my data: https://pastebin.com/hYg3zqYz
I just need the numbers that come after Bipolar in column 12.
Here's what works
p <- df %>%
select(where(~ any(stringr::str_detect(.x, "Bipolar")))) #returns correct column
Where I try then try to make a new column that pulls just the text, it only ever returns the first row, not sure what I'm doing wrong.
p %>%
mutate(group = "sr_bipol",
sr_bipol = as.numeric(stringr::str_extract(., "[0-9].[0-9] "))) %>%
select(group, sr_bipol)
# A tibble: 20 × 2
group sr_bipol
<chr> <dbl>
1 sr_bipol 7.83
2 sr_bipol 7.83
3 sr_bipol 7.83
4 sr_bipol 7.83
5 sr_bipol 7.83
.....................
I also get the error code:
argument is not an atomic vector; coercing
CodePudding user response:
The .
refers to the whole dataset (str_extract
needs a vector as input and not a data.frame). According to ?str_extract
string - Input vector. Either a character vector, or something coercible to one.
We may need to apply str_extract
on the column 12. As the column name for 12 prefix include ...
that are unusual column names, use backticks to access the column values
library(dplyr)
library(stringr)
df %>%
transmute(group = 'sr_bipol',
sr_bipol = as.numeric(str_extract(`...12`, "(?<=Bipolar\\s)[0-9]\\.[0-9] ")))
-output
# A tibble: 20 × 2
group sr_bipol
<chr> <dbl>
1 sr_bipol 7.83
2 sr_bipol 2.34
3 sr_bipol 1.97
4 sr_bipol 1.94
5 sr_bipol 2.85
6 sr_bipol 2.92
7 sr_bipol 3.05
8 sr_bipol 2.80
9 sr_bipol 3.43
10 sr_bipol 2.11
11 sr_bipol 2.80
12 sr_bipol 1.81
13 sr_bipol 1.84
14 sr_bipol 3.87
15 sr_bipol 1.68
16 sr_bipol 2.21
17 sr_bipol 2.97
18 sr_bipol 3.09
19 sr_bipol 2.84
20 sr_bipol 3.48
The p
data is a single column tibble/data.frame
. When we use .
, it selects the data.frame as such i.e.
> str(p)
tibble [20 × 1] (S3: tbl_df/tbl/data.frame)
$ ...12: chr [1:20] "Bipolar 7.827 / Unipolar 16.911 / LAT -9.0" "Bipolar 2.34 / Unipolar 9.09 / LAT -10.0" "Bipolar 1.974 / Unipolar 9.219 / LAT -11.0" "Bipolar 1.938 / Unipolar 10.572 / LAT -9.0" ...
> str_extract(p, "[0-9].[0-9] ")
[1] "7.827"
Warning message:
In stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
argument is not an atomic vector; coercing
It extracts the value from the first instance and this got recycled to create the whole column of 7.8
If there are more than one column having the 'Bipolar' we may loop across
(modify the transmute
to mutate
if we want to keep all other columns from the original data)
df %>%
transmute(across(where(~ any(stringr::str_detect(.x, "Bipolar"))),
~ as.numeric(str_extract(.x, "(?<=Bipolar\\s)[0-9]\\.[0-9] ")),
.names = "sr_bipol{str_remove(.col, '[.] ')}"))
# A tibble: 20 × 1
sr_bipol12
<dbl>
1 7.83
2 2.34
3 1.97
4 1.94
5 2.85
6 2.92
7 3.05
8 2.80
9 3.43
10 2.11
11 2.80
12 1.81
13 1.84
14 3.87
15 1.68
16 2.21
17 2.97
18 3.09
19 2.84
20 3.48
CodePudding user response:
Here is an alternative approach:
library(tidyverse)
df %>%
select(...12) %>%
separate(...12, into="group", sep = "\\/") %>%
mutate(sr_bipol = parse_number(group),
group= str_extract(group, '[A-Za-z] '))
group sr_bipol
<chr> <dbl>
1 Bipolar 7.83
2 Bipolar 2.34
3 Bipolar 1.97
4 Bipolar 1.94
5 Bipolar 2.85
6 Bipolar 2.92
7 Bipolar 3.05
8 Bipolar 2.80
9 Bipolar 3.43
10 Bipolar 2.11
11 Bipolar 2.80
12 Bipolar 1.81
13 Bipolar 1.84
14 Bipolar 3.87
15 Bipolar 1.68
16 Bipolar 2.21
17 Bipolar 2.97
18 Bipolar 3.09
19 Bipolar 2.84
20 Bipolar 3.48