I have the following vector:
vec<-c("70,00 mln €", "20,50 mln €", "400 mila", "400 mila", "400 mila", "100 mila", "50 mila")
In vec
, mln
means "milions" whereas mila
means "thousand". I would like to convert this vector in a numeric vector like the following
70000000, 20500000, 400000, 400000, 400000, 100000, 50000
e.g. 70000000 stands for 70,00 mln, 20500000 stands for 20,50 mln and so on.
I tried with the following:
unlist(regmatches(vec, gregexpr("[[:digit:]] ", vec)))
to take the numeric part of the strings and then multiply by 1000 or 1000000, but I obtained:
[1] "70" "00" "20" "50" "400" "400" "400" "100" "50"
Here, "70" "00"
should be just "70"
, "20" "50"
should be instead 20.5
(numeric).
EDIT The one above is just an example. The true (longer) vector is the following
vec <- c("70,00 mln €", "20,50 mln €", "7,00 mln €", "1,90 mln €",
"1,50 mln €", "16,00 mln €", "15,00 mln €", "3,00 mln €",
"10 mln €", "6,70 mln €", "5,25 mln €", "4,80 mln €",
"3,68 mln €", "1,19 mln €", "1,00 mln €", "21 mln €",
"20 mln €", "3 mln €", "2 mln €", "1,95 mln €", "14.5 mln",
"14.5 mln", "12 mln", "7 mln", "2,32 mln", "21,30 mln", "21 mln",
"20 mln", "5 mln", "3,5 mln", "2 mln", "2 mln", "1,00 mln €",
"19,92 mln", "12,70 mln", "8,00 mln", "1 mln", "4,50 mln", "1,95 mln",
"4,50 mln", "1,95 mln", "1,00 mln €", "10,00 mln €", "2,00 mln €",
"2 mln", "4,50 mln", "8,00 mln €", "4,90 mln €", "1,00 mln €",
"400 mila", "400 mila", "400 mila", "100 mila", "50 mila", "600 mila €",
"500 mila €", "500 mila €", "200 mila €", "600 mila",
"520 mila", "200 mila", "100 mila", "500 mila €", "300 mila €",
"200 mila €", "150 mila €", "20 mila €", "700 mila €",
"500 mila", "500 mila", "600 mila €", "450 mila €", "33 mila €",
"500 mila €", "700 mila €", "250 mila €", "100 mila €"
)
CodePudding user response:
An easier option is to do the replacement with e6
and e3
for mln
and mila
after removing the space and other characters and then convert to numeric with as.numeric
library(stringr)
as.numeric(str_replace_all(str_remove_all(chartr(",", ".", vec),
"\\s €|\\s "), c(mln = "e6", "mila" = "e3")))
-output
[1] 70000000 20500000 400000 400000 400000 100000 50000