Beginner here: I have a dataframe with multiple columns that are currently strings that contain a $-sign and spaces and I want to turn them into numeric. My dataframe looks like this:
Name Col_x_1 Company Col_x_2 Start_Year End_Year Col_x_3
asd $841 392 Test $31 000 1902 1933 0
kfj 0 Test_2 0 1933 1954 $10 000
ale $200 000 Test_3 0 1988 1999 0
...
I am currently using the following code to loop this through for the columns named Col_x_
as they are all named the same in ascending order:
library(tidyverse)
df %>%
mutate(across(starts_with("Col_x_"), ~gsub("\\$", "", .) %>%
as.numeric())
)
however, this only gives me NAs as the as.numeric() does not work. Does anyone know how I can fix this code? Thank you in advance!
CodePudding user response:
library(tidyverse)
df %>%
mutate(across(starts_with("Col_x_"), ~ str_remove_all(.x, "[^0-9]"))) %>%
type_convert()
# A tibble: 3 × 7
Name Col_x_1 Company Col_x_2 Start_Year End_Year Col_x_3
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 asd 841392 Test 31000 1902 1933 0
2 kfj 0 Test_2 0 1933 1954 10000
3 ale 200000 Test_3 0 1988 1999 0
CodePudding user response:
In addition to the solutions in the comments, you could also use the convenience functions of {readr}, e. g.:
library(readr)
my_locale <- locale(grouping_mark = " ")
effect:
> parse_number("$12 235", locale = my_locale)
[1] 12235