Home > Software design >  Why does Gsub not work when converting units?
Why does Gsub not work when converting units?

Time:11-09

I was wondering if anyone could help point me in the right direction in how to fix my function. Basically, I have a list of measurements written in different units (e.g. ft, cm, m) and I've been writting a function to put them all into the same base units of cm i.e.:

Turn this:

5_ft_7
157_cm
5_ft_11
167_cm
1.65_m
187_cm
1.71_m
188_cm
5_ft_2
5_ft_5

Into this:

170.180
157.000
155.194
167.000
165.000
187.000
171.000
188.000
157.480
165.100

Using this function I've created:

  Conv.Question2 <- function(x) {
    (Conv.ft <- function(x) {
      Data_original.ft <- (x)
      Data_original.ft <-lapply(Data_original.ft, gsub, pattern= "_ft_", replacement =".")
      k <- as.numeric(suppressWarnings(x)) 
      Feet.cm <- (floor(k)*30.48)  #Feet in cm
      Inches.cm <- data.frame(((k-floor(k))*10)*2.54) #Inches in cm
      Total.cm <- data.frame(Feet.cm, Inches.cm)
      Final <- data.frame(Total.cm %>% replace(is.na(.), 0) %>% mutate(Total = rowSums(.)))
      print(data.frame(Final$Total))
    })
      (Conv.cm <- function(x) {
      Data_original.cm <- (x)
      Data_original.cm <- lapply(Data_original.cm, gsub, pattern= "_cm", replacement ="")
      l <- as.numeric(suppressWarnings(x)) 
      Total.cm <- data.frame(l)
      Final.cm <- data.frame(Total.cm %>% replace(is.na(.), 0) %>% mutate(Total = rowSums(.)))
      print(data.frame(Final.cm$Total))
    })
      (Conv.m <- function(x) {
      Data_original.m <- (x)
      Data_original.m <- lapply(Data_original.m, gsub, pattern= "_m", replacement ="")
      m <- as.numeric(suppressWarnings(x)) 
      Total.m <- data.frame(m*100)
      Final.m <- data.frame(Total.m %>% replace(is.na(.), 0) %>% mutate(Total = rowSums(.)))
      print(data.frame(Final.m$Total))
    })
    Height_ft <- Conv.ft(x)
    Height_cm <- Conv.cm(x)
    Height_m <- Conv.m(x)
    Test104 <- cbind(Height_ft,Height_cm,Height_m)
    Test105 <- data.frame(rowSums(Test104, na.rm=TRUE))
    colnames(Test105) <- c("Height(cm)")
    Test105[Test105 == 0] <-NA
    cat("\014") #Clears console
    print(Test105)
  }

(and yes, I do realise that my feet and inches conversion still needs work, but I'm fairly sure that is something I can fix myself)

The issue is that whenever I run it, there are no warning messages of concern, but it just produces a list of NAs. Hopefully, I am close to fixing it, but after trying several different methods, I have so far not been able to work out what is going wrong.

Any advice on what part(s) of this function are causing it to fail would be very much appreciated, thank you.

CodePudding user response:

This seems like an overly complex approach. I would probably boil it all down to a single function that uses ifelse to parse each measurement type:

convert_all <- function(text)
{
  suppressWarnings(
    ifelse(grepl("_m", text), 100 * as.numeric(gsub("_m", "", text)),
       ifelse(grepl("_cm", text), as.numeric(gsub("_cm", "", text)),
              sapply(strsplit(text, "_ft_"), function(x) {
                x <- as.numeric(x)
                (x[1] * 12   x[2]) * 2.54
              })))
  )
}

So, in your case, we would have:

measurements <- c("5_ft_7", "157_cm", "5_ft_11", "167_cm", "1.65_m", "187_cm", 
                  "1.71_m", "188_cm", "5_ft_2", "5_ft_5")

convert_all(measurements)
#> [1] 170.18 157.00 180.34 167.00 165.00 187.00 171.00 188.00 157.48 165.10

An alternative approach if you prefer to see your working and avoid suppressWarnings calls, both of which might help in debugging, is to break the problem down into little sub-problems by creating a function to convert each measurement type, and a function to distribute the work between these functions appropriately:

as_metres <- function(text) 100 * as.numeric(gsub("_m", "", text))

as_cm     <- function(text) as.numeric(gsub("_cm", "", text))

as_ft     <- function(text) sapply(strsplit(text, "_ft_"), 
                               function(x) {
                                 x <- as.numeric(x)
                                 (x[1] * 12   x[2]) * 2.54
                                })

convert_all <- function(text)
{
  result <- numeric(length(text))
  result[grepl("_cm", text)] <- as_cm(grep("_cm", text, value = TRUE))
  result[grepl("_m", text)] <- as_metres(grep("_m", text, value = TRUE))
  result[grepl("_ft", text)] <- as_ft(grep("_ft", text, value = TRUE))
  return(result)
}

CodePudding user response:

Alternatively, using tidyverse

library(tidyverse)
df <- as_tibble(
  c("5_ft_7", "157_cm", "5_ft_11", "167_cm", "1.65_m", "187_cm", "1.71_m", "188_cm", "5_ft_2", "5_ft_5")
)
df %>%
  separate(value, into = c("big", "unit", "small"), fill = "right") %>%
  mutate(small = suppressWarnings(as.numeric(small)),
         big = as.numeric(ifelse(str_detect(unit, "\\d"), paste0(big, unit), big))) %>%
  transmute(height = case_when(
    unit == "ft" ~ big * 30.48   small * 2.54,
    unit == "cm" ~ big,
    str_detect(unit, pattern = "\\d") ~ big
  )) %>%
  pull(height)
#> [1] 170.18 157.00 180.34 167.00 165.00 187.00 171.00 188.00 157.48 165.10

CodePudding user response:

Reviewing code in question

Taking Conv.ft as an example it replaces _ft_ with a dot but then tries to convert everything to numeric including the non-_ft_ components causing the introduction of NA's. Instead determine which components contain _ft_ and then convert those (using sub, not gsub since we only want one replacement per component) and return 0 for the others. Note that sub and gsub are vectorized already so they don't need lapply. Now when we convert to numeric no NA's will be produced. Putting all this together use this to set k:

  k <- as.numeric(ifelse(grepl("_ft_", x), sub("_ft_", ".", x), 0))

Make analogous changes in the other Conv.* functions.

A different approach.

Here is a function which uses a slightly different approach. It still uses separate functions for each unit calling that function whose name equals the unit.

With the example input from the question the units and nums variable equal the following:

units <- c("ft", "cm", "ft", "cm", "m", "cm", "m", "cm", "ft", "ft")
nums <- c("5    7", "157   ", "5    11", "167   ", "1.65  ", "187   ", 
  "1.71  ", "188   ", "5    2", "5    5")

The do.call calls the function whose name is its first argument with the arguments, here just one, given by the components of the list passed as the second argument.

Conv2cm <- function(x) {
  ft <- function(x) sum(read.table(text = x) * c(12 * 2.54, 2.54))
  cm <- function(x) as.numeric(x)
  m <- function(x) 100 * as.numeric(x)
  units <- gsub("[0-9._]", "", x)
  nums <- gsub("[^0-9.]", " ", x)
  mapply(function(u, x) do.call(u, list(x)), units, nums, USE.NAMES = FALSE)
}

Conv2cm(measurements)
##  [1] 170.18 157.00 180.34 167.00 165.00 187.00 171.00 188.00 157.48 165.10

CodePudding user response:

A flexible approach than can handle almost any type of measurement you can throw at it..
for distances, it can handle:

angstrom, nm, um, mm, cm, dm, m, km, inch, ft, 
yd, fathom, mi, naut_mi, au, light_yr,
parsec, point

Core of the conversion is the conv_unit() function from the measurements-package

library(data.table)
library(measurements)
DT <- fread("5_ft_7
157_cm
5_ft_11
167_cm
1.65_m
187_cm
1.71_m
188_cm
5_ft_2
5_ft_5", header = FALSE)

#add rowid
DT[, id := .I]
# split on _
DT[, c("val1", "units", "val2") := tstrsplit(V1, "_")][]
# add second unit vor inches
DT[units == "ft", units2 := "inch"]
#         V1 id val1 units val2 units2
# 1:  5_ft_7  1    5    ft    7   inch
# 2:  157_cm  2  157    cm <NA>   <NA>
# 3: 5_ft_11  3    5    ft   11   inch
# 4:  167_cm  4  167    cm <NA>   <NA>
# 5:  1.65_m  5 1.65     m <NA>   <NA>
# 6:  187_cm  6  187    cm <NA>   <NA>
# 7:  1.71_m  7 1.71     m <NA>   <NA>
# 8:  188_cm  8  188    cm <NA>   <NA>
# 9:  5_ft_2  9    5    ft    2   inch
#10:  5_ft_5 10    5    ft    5   inch

# melt to long format
DT.melt <- melt(DT, id.vars = "id", 
                measure.vars = patterns(unit = "^units", 
                                        value = "^val"),
                na.rm = TRUE)
#    id variable unit value
# 1:  1        1   ft     5
# 2:  2        1   cm   157
# 3:  3        1   ft     5
# 4:  4        1   cm   167
# 5:  5        1    m  1.65
# 6:  6        1   cm   187
# 7:  7        1    m  1.71
# 8:  8        1   cm   188
# 9:  9        1   ft     5
#10: 10        1   ft     5
#11:  1        2 inch     7
#12:  3        2 inch    11
#13:  9        2 inch     2
#14: 10        2 inch     5

# split by unit
L <- split(DT.melt, by = "unit")
# calculate in centimeter
L <- lapply(L, function(x){
  x[, value_cm := conv_unit(as.numeric(value), 
                            unique(unit), "cm")]
})
# bind together and summarise, join on DT
DT[rbindlist(L)[, .(value_cm = sum(value_cm)), by = .(id)],
   in_cm := i.value_cm, on = .(id)]
#         V1 id val1 units val2 units2  in_cm
# 1:  5_ft_7  1    5    ft    7   inch 170.18
# 2:  157_cm  2  157    cm <NA>   <NA> 157.00
# 3: 5_ft_11  3    5    ft   11   inch 180.34
# 4:  167_cm  4  167    cm <NA>   <NA> 167.00
# 5:  1.65_m  5 1.65     m <NA>   <NA> 165.00
# 6:  187_cm  6  187    cm <NA>   <NA> 187.00
# 7:  1.71_m  7 1.71     m <NA>   <NA> 171.00
# 8:  188_cm  8  188    cm <NA>   <NA> 188.00
# 9:  5_ft_2  9    5    ft    2   inch 157.48
#10:  5_ft_5 10    5    ft    5   inch 165.10

CodePudding user response:

Here's a dplyr solution with some purrr functionality:

library(dplyr)
library(purrr)
data.frame(x) %>%
  mutate(
    x = sub("_cm", ".000", x),
    x = sub("(\\d )\\.(\\d )_m", "\\1\\2", x),
    temp = ifelse(grepl("ft", x), strsplit(x, "_ft_"), NA),
    x = ifelse(grepl("ft", x), 
               map_dbl(temp, function(x) (as.numeric(x)[1]*12   as.numeric(x)[2])*2.54),
               x),
    x = as.numeric(x)
  ) %>%
  select(-temp)
        x
1  170.18
2  157.00
3  180.34
4  167.00
5  165.00
6  187.00
7  171.00
8  188.00
9  157.48
10 165.10

Here's a stepwise, mostly base R procedure:

# values with "_cm":
cm <- grep("_cm", x, value = TRUE)
cm_new <- as.numeric(sub("_cm", ".000", cm))

# values with "_m":
m <- grep("_m", x, value = TRUE)
m_new <- as.numeric(gsub("(\\d )\\.(\\d )?_m", "\\1\\2", m))

# values with "_ft_":
ft <- grep("_ft", x, value = TRUE)
ft_0 <- gsub("(\\d )_ft_(\\d )", "\\1.\\2", ft)
library(purrr)
ft_new <- map_dbl(strsplit(ft_0, "\\."), function(x) (as.numeric(x)[1]*12   as.numeric(x)[2])*2.54)

# now concatenate:
all_new <- c(cm_new, m_new, ft_new)

Result:

all_new
 [1] 157.00 167.00 187.00 188.00 165.00 171.00 170.18 180.34 157.48 165.10

Data:

x <- c("5_ft_7",
      "157_cm",
      "5_ft_11",
      "167_cm","1.65_m",
      "187_cm","1.71_m",
      "188_cm","5_ft_2",
      "5_ft_5")
  •  Tags:  
  • r
  • Related