I have a column in my dataframe from a survey that has two different units in it. I need to remove these and convert the info into a consistent unit ie a column of all cm without the unit being present.
Here is some sample data
df <- data.frame(v1 = c('100 cm', '6 foot 10', '200 cm', '5 foot 11')
I attempted to use this readr::parse_number(df$v1) but that would turn '6 foot 10' into 6. I'm not sure it's that helpful anyway because I still need to convert the heights recorded as feet and inches into cm
CodePudding user response:
You can split the strings by foot ('6 foot 10'
>> c(6, 10)
), and then map through the items, parse them as numbers, and -- if they have length 2 (ie, are the result of a string split) -- multiply them by the conversion rates.
library(tidyverse)
df$v1 %>%
strsplit(" foot ") %>%
map_dbl(function(num){
num = parse_number(num)
if(length(num) == 2) {sum(num * c(30.48, 2.54))} else {num}})
[1] 100.00 208.28 200.00 180.34
Another option, but less elegant in my opinion, is to change the '6 foot 10'
to
'6*30.48 2.54*10'
(multiply the #feet and #inches with their conversion rates) using stringr::str_replace_all
, and then evaluate those strings as expressions with eval
parse
(with a mapping function like sapply
or purrr::map
, because these functions are not vectorised).
df$v1 %>%
str_replace_all(c(" cm" = "", " foot " = "*30.48 2.54*")) %>%
map_dbl(~ parse(text = .x) %>% eval())
[1] 100.00 208.28 200.00 180.34
CodePudding user response:
Maybe something like this (with tidyverse):
df %>% mutate(
units = case_when(
str_detect(v1, ".cm") ~ "cm",
str_detect(v1, ".foot") ~ "foot"
)
)
Then you get a unit column and based on that can apply a conversion factor.