Transform integer values in a char column in R-CodePudding

I was trying to do transform some datasets in R when I found the following issue: I have got a char column that shows the income of some people (a census). So what I was trying to do is to standardize the data for future analysis. This is a sample of the data:

income
2000,3 Thousand Euros
50,14 Thousand Euros
54000 Euros

This is what I am expecting:

income
2000.3 k€
50.14 k€
54 k€

And finally, this is the code I have got so far, but it still not working. I am new in R and I am still searching for methods. To clarify, in the if statement what I was trying is to search all those values that have more than 4 digits, but I think it is easier to search the ones which have " Euros". But to make operations, I believe I have to transform the char column into an integer one, so the " Euros" regex will not be valid (I believe).

    census$income <- str_replace_all(census$income, " Thousand Euros", '')
    census$income <- str_replace_all(census$income, " Euros", '')
    census$income <- as.integer(census$income)
    if(floor(log10(census$income)) 1>4){
      census$income/1000
    }
    census$income <- as.character(census$income)

Thank you very much for any help! =)

CodePudding user response：

I think you can accomplish this with a combination of readr::parse_number and str_detect(tolower(income), "thousand").

census %>% 
  mutate(
    parsed_income = if_else(
        str_detect(tolower(income), "thousand"), 
        parse_number(income), 
        1000 * parse_number(income)
    )
  )

CodePudding user response：

A solution with nested sub:

dyplyr

library(dplyr)
df %>%
  mutate(income = sub("(000\\s|\\sThousand\\s)?Euros", " k€", 
                      sub(",", ".", income)))
      income
1 2000.3 k€
2  50.14 k€
3     54 k€

base R:

df$income <- sub("(000\\s|\\sThousand\\s)?Euros", " k€", 
                 sub(",", ".", df$income))

Data:

df <- data.frame(
  income = c("2000,3 Thousand Euros","50,14 Thousand Euros","54000 Euros")
)