Home > Back-end >  Take the number contained in a string (also floating point number)
Take the number contained in a string (also floating point number)

Time:03-13

I have the following vector:

vec<-c("70,00 mln  €", "20,50 mln €", "400 mila", "400 mila", "400 mila", "100 mila", "50 mila")

In vec, mln means "milions" whereas mila means "thousand". I would like to convert this vector in a numeric vector like the following

70000000, 20500000, 400000, 400000, 400000, 100000, 50000

e.g. 70000000 stands for 70,00 mln, 20500000 stands for 20,50 mln and so on.

I tried with the following:

unlist(regmatches(vec, gregexpr("[[:digit:]] ", vec)))

to take the numeric part of the strings and then multiply by 1000 or 1000000, but I obtained:

[1] "70"  "00"  "20"  "50"  "400" "400" "400" "100" "50" 

Here, "70" "00" should be just "70", "20" "50" should be instead 20.5 (numeric).

EDIT The one above is just an example. The true (longer) vector is the following

vec <- c("70,00 mln  €", "20,50 mln €", "7,00 mln €", "1,90 mln €", 
"1,50 mln €", "16,00 mln €", "15,00 mln €", "3,00 mln €", 
"10 mln €", "6,70 mln €", "5,25 mln €", "4,80 mln €", 
"3,68 mln €", "1,19 mln €", "1,00 mln €", "21 mln €", 
"20 mln €", "3 mln €", "2 mln €", "1,95 mln €", "14.5 mln", 
"14.5 mln", "12 mln", "7 mln", "2,32 mln", "21,30 mln", "21 mln", 
"20 mln", "5 mln", "3,5 mln", "2 mln", "2 mln", "1,00 mln €", 
"19,92 mln", "12,70 mln", "8,00 mln", "1 mln", "4,50 mln", "1,95 mln", 
"4,50 mln", "1,95 mln", "1,00 mln €", "10,00 mln €", "2,00 mln €", 
"2 mln", "4,50 mln", "8,00 mln €", "4,90 mln €", "1,00 mln €", 
"400 mila", "400 mila", "400 mila", "100 mila", "50 mila", "600 mila €", 
"500 mila €", "500 mila €", "200 mila €", "600 mila", 
"520 mila", "200 mila", "100 mila", "500 mila €", "300 mila €", 
"200 mila €", "150 mila €", "20 mila €", "700 mila €", 
"500 mila", "500 mila", "600 mila €", "450 mila €", "33 mila €", 
"500 mila €", "700 mila €", "250 mila €", "100 mila €"
)

CodePudding user response:

An easier option is to do the replacement with e6 and e3 for mln and mila after removing the space and other characters and then convert to numeric with as.numeric

library(stringr)
as.numeric(str_replace_all(str_remove_all(chartr(",", ".", vec), 
        "\\s €|\\s "), c(mln = "e6", "mila" = "e3")))

-output

[1] 70000000 20500000   400000   400000   400000   100000    50000
  • Related