I was wondering if anyone could help point me in the right direction in how to fix my function. Basically, I have a list of measurements written in different units (e.g. ft, cm, m) and I've been writting a function to put them all into the same base units of cm i.e.:
Turn this:
5_ft_7
157_cm
5_ft_11
167_cm
1.65_m
187_cm
1.71_m
188_cm
5_ft_2
5_ft_5
Into this:
170.180
157.000
155.194
167.000
165.000
187.000
171.000
188.000
157.480
165.100
Using this function I've created:
Conv.Question2 <- function(x) {
(Conv.ft <- function(x) {
Data_original.ft <- (x)
Data_original.ft <-lapply(Data_original.ft, gsub, pattern= "_ft_", replacement =".")
k <- as.numeric(suppressWarnings(x))
Feet.cm <- (floor(k)*30.48) #Feet in cm
Inches.cm <- data.frame(((k-floor(k))*10)*2.54) #Inches in cm
Total.cm <- data.frame(Feet.cm, Inches.cm)
Final <- data.frame(Total.cm %>% replace(is.na(.), 0) %>% mutate(Total = rowSums(.)))
print(data.frame(Final$Total))
})
(Conv.cm <- function(x) {
Data_original.cm <- (x)
Data_original.cm <- lapply(Data_original.cm, gsub, pattern= "_cm", replacement ="")
l <- as.numeric(suppressWarnings(x))
Total.cm <- data.frame(l)
Final.cm <- data.frame(Total.cm %>% replace(is.na(.), 0) %>% mutate(Total = rowSums(.)))
print(data.frame(Final.cm$Total))
})
(Conv.m <- function(x) {
Data_original.m <- (x)
Data_original.m <- lapply(Data_original.m, gsub, pattern= "_m", replacement ="")
m <- as.numeric(suppressWarnings(x))
Total.m <- data.frame(m*100)
Final.m <- data.frame(Total.m %>% replace(is.na(.), 0) %>% mutate(Total = rowSums(.)))
print(data.frame(Final.m$Total))
})
Height_ft <- Conv.ft(x)
Height_cm <- Conv.cm(x)
Height_m <- Conv.m(x)
Test104 <- cbind(Height_ft,Height_cm,Height_m)
Test105 <- data.frame(rowSums(Test104, na.rm=TRUE))
colnames(Test105) <- c("Height(cm)")
Test105[Test105 == 0] <-NA
cat("\014") #Clears console
print(Test105)
}
(and yes, I do realise that my feet and inches conversion still needs work, but I'm fairly sure that is something I can fix myself)
The issue is that whenever I run it, there are no warning messages of concern, but it just produces a list of NAs. Hopefully, I am close to fixing it, but after trying several different methods, I have so far not been able to work out what is going wrong.
Any advice on what part(s) of this function are causing it to fail would be very much appreciated, thank you.
CodePudding user response:
This seems like an overly complex approach. I would probably boil it all down to a single function that uses ifelse
to parse each measurement type:
convert_all <- function(text)
{
suppressWarnings(
ifelse(grepl("_m", text), 100 * as.numeric(gsub("_m", "", text)),
ifelse(grepl("_cm", text), as.numeric(gsub("_cm", "", text)),
sapply(strsplit(text, "_ft_"), function(x) {
x <- as.numeric(x)
(x[1] * 12 x[2]) * 2.54
})))
)
}
So, in your case, we would have:
measurements <- c("5_ft_7", "157_cm", "5_ft_11", "167_cm", "1.65_m", "187_cm",
"1.71_m", "188_cm", "5_ft_2", "5_ft_5")
convert_all(measurements)
#> [1] 170.18 157.00 180.34 167.00 165.00 187.00 171.00 188.00 157.48 165.10
An alternative approach if you prefer to see your working and avoid suppressWarnings
calls, both of which might help in debugging, is to break the problem down into little sub-problems by creating a function to convert each measurement type, and a function to distribute the work between these functions appropriately:
as_metres <- function(text) 100 * as.numeric(gsub("_m", "", text))
as_cm <- function(text) as.numeric(gsub("_cm", "", text))
as_ft <- function(text) sapply(strsplit(text, "_ft_"),
function(x) {
x <- as.numeric(x)
(x[1] * 12 x[2]) * 2.54
})
convert_all <- function(text)
{
result <- numeric(length(text))
result[grepl("_cm", text)] <- as_cm(grep("_cm", text, value = TRUE))
result[grepl("_m", text)] <- as_metres(grep("_m", text, value = TRUE))
result[grepl("_ft", text)] <- as_ft(grep("_ft", text, value = TRUE))
return(result)
}
CodePudding user response:
Alternatively, using tidyverse
library(tidyverse)
df <- as_tibble(
c("5_ft_7", "157_cm", "5_ft_11", "167_cm", "1.65_m", "187_cm", "1.71_m", "188_cm", "5_ft_2", "5_ft_5")
)
df %>%
separate(value, into = c("big", "unit", "small"), fill = "right") %>%
mutate(small = suppressWarnings(as.numeric(small)),
big = as.numeric(ifelse(str_detect(unit, "\\d"), paste0(big, unit), big))) %>%
transmute(height = case_when(
unit == "ft" ~ big * 30.48 small * 2.54,
unit == "cm" ~ big,
str_detect(unit, pattern = "\\d") ~ big
)) %>%
pull(height)
#> [1] 170.18 157.00 180.34 167.00 165.00 187.00 171.00 188.00 157.48 165.10
CodePudding user response:
Reviewing code in question
Taking Conv.ft as an example it replaces _ft_
with a dot but then tries to convert everything to numeric including the non-_ft_
components causing the introduction of NA's. Instead determine which components contain _ft_
and then convert those (using sub, not gsub since we only want one replacement per component) and return 0 for the others. Note that sub and gsub are vectorized already so they don't need lapply. Now when we convert to numeric no NA's will be produced.
Putting all this together use this to set k:
k <- as.numeric(ifelse(grepl("_ft_", x), sub("_ft_", ".", x), 0))
Make analogous changes in the other Conv.* functions.
A different approach.
Here is a function which uses a slightly different approach. It still uses separate functions for each unit calling that function whose name equals the unit.
With the example input from the question the units and nums variable equal the following:
units <- c("ft", "cm", "ft", "cm", "m", "cm", "m", "cm", "ft", "ft")
nums <- c("5 7", "157 ", "5 11", "167 ", "1.65 ", "187 ",
"1.71 ", "188 ", "5 2", "5 5")
The do.call calls the function whose name is its first argument with the arguments, here just one, given by the components of the list passed as the second argument.
Conv2cm <- function(x) {
ft <- function(x) sum(read.table(text = x) * c(12 * 2.54, 2.54))
cm <- function(x) as.numeric(x)
m <- function(x) 100 * as.numeric(x)
units <- gsub("[0-9._]", "", x)
nums <- gsub("[^0-9.]", " ", x)
mapply(function(u, x) do.call(u, list(x)), units, nums, USE.NAMES = FALSE)
}
Conv2cm(measurements)
## [1] 170.18 157.00 180.34 167.00 165.00 187.00 171.00 188.00 157.48 165.10
CodePudding user response:
A flexible approach than can handle almost any type of measurement you can throw at it..
for distances, it can handle:
angstrom, nm, um, mm, cm, dm, m, km, inch, ft, yd, fathom, mi, naut_mi, au, light_yr, parsec, point
Core of the conversion is the conv_unit()
function from the measurements
-package
library(data.table)
library(measurements)
DT <- fread("5_ft_7
157_cm
5_ft_11
167_cm
1.65_m
187_cm
1.71_m
188_cm
5_ft_2
5_ft_5", header = FALSE)
#add rowid
DT[, id := .I]
# split on _
DT[, c("val1", "units", "val2") := tstrsplit(V1, "_")][]
# add second unit vor inches
DT[units == "ft", units2 := "inch"]
# V1 id val1 units val2 units2
# 1: 5_ft_7 1 5 ft 7 inch
# 2: 157_cm 2 157 cm <NA> <NA>
# 3: 5_ft_11 3 5 ft 11 inch
# 4: 167_cm 4 167 cm <NA> <NA>
# 5: 1.65_m 5 1.65 m <NA> <NA>
# 6: 187_cm 6 187 cm <NA> <NA>
# 7: 1.71_m 7 1.71 m <NA> <NA>
# 8: 188_cm 8 188 cm <NA> <NA>
# 9: 5_ft_2 9 5 ft 2 inch
#10: 5_ft_5 10 5 ft 5 inch
# melt to long format
DT.melt <- melt(DT, id.vars = "id",
measure.vars = patterns(unit = "^units",
value = "^val"),
na.rm = TRUE)
# id variable unit value
# 1: 1 1 ft 5
# 2: 2 1 cm 157
# 3: 3 1 ft 5
# 4: 4 1 cm 167
# 5: 5 1 m 1.65
# 6: 6 1 cm 187
# 7: 7 1 m 1.71
# 8: 8 1 cm 188
# 9: 9 1 ft 5
#10: 10 1 ft 5
#11: 1 2 inch 7
#12: 3 2 inch 11
#13: 9 2 inch 2
#14: 10 2 inch 5
# split by unit
L <- split(DT.melt, by = "unit")
# calculate in centimeter
L <- lapply(L, function(x){
x[, value_cm := conv_unit(as.numeric(value),
unique(unit), "cm")]
})
# bind together and summarise, join on DT
DT[rbindlist(L)[, .(value_cm = sum(value_cm)), by = .(id)],
in_cm := i.value_cm, on = .(id)]
# V1 id val1 units val2 units2 in_cm
# 1: 5_ft_7 1 5 ft 7 inch 170.18
# 2: 157_cm 2 157 cm <NA> <NA> 157.00
# 3: 5_ft_11 3 5 ft 11 inch 180.34
# 4: 167_cm 4 167 cm <NA> <NA> 167.00
# 5: 1.65_m 5 1.65 m <NA> <NA> 165.00
# 6: 187_cm 6 187 cm <NA> <NA> 187.00
# 7: 1.71_m 7 1.71 m <NA> <NA> 171.00
# 8: 188_cm 8 188 cm <NA> <NA> 188.00
# 9: 5_ft_2 9 5 ft 2 inch 157.48
#10: 5_ft_5 10 5 ft 5 inch 165.10
CodePudding user response:
Here's a dplyr
solution with some purrr
functionality:
library(dplyr)
library(purrr)
data.frame(x) %>%
mutate(
x = sub("_cm", ".000", x),
x = sub("(\\d )\\.(\\d )_m", "\\1\\2", x),
temp = ifelse(grepl("ft", x), strsplit(x, "_ft_"), NA),
x = ifelse(grepl("ft", x),
map_dbl(temp, function(x) (as.numeric(x)[1]*12 as.numeric(x)[2])*2.54),
x),
x = as.numeric(x)
) %>%
select(-temp)
x
1 170.18
2 157.00
3 180.34
4 167.00
5 165.00
6 187.00
7 171.00
8 188.00
9 157.48
10 165.10
Here's a stepwise, mostly base R
procedure:
# values with "_cm":
cm <- grep("_cm", x, value = TRUE)
cm_new <- as.numeric(sub("_cm", ".000", cm))
# values with "_m":
m <- grep("_m", x, value = TRUE)
m_new <- as.numeric(gsub("(\\d )\\.(\\d )?_m", "\\1\\2", m))
# values with "_ft_":
ft <- grep("_ft", x, value = TRUE)
ft_0 <- gsub("(\\d )_ft_(\\d )", "\\1.\\2", ft)
library(purrr)
ft_new <- map_dbl(strsplit(ft_0, "\\."), function(x) (as.numeric(x)[1]*12 as.numeric(x)[2])*2.54)
# now concatenate:
all_new <- c(cm_new, m_new, ft_new)
Result:
all_new
[1] 157.00 167.00 187.00 188.00 165.00 171.00 170.18 180.34 157.48 165.10
Data:
x <- c("5_ft_7",
"157_cm",
"5_ft_11",
"167_cm","1.65_m",
"187_cm","1.71_m",
"188_cm","5_ft_2",
"5_ft_5")