I am working on an assignment where one of my columns contains measurements in feet, cm, and m, and I am trying to convert them all to metres. So far I have been able to convert individual cells to simple numeric values (e.g. 5_ft_7 to 5.7), but I cannot find a function that will convert them to metres, especially without modifying the values that are already in centimetres/metres.
My question is, is there a way of targeting JUST the cells that contain 'ft' without specifying each of them individually?
Dataset (hopefully this helps):
> Data_original3$Height
[1] "5.7" "157_cm" "5.11" "167_cm" "1.65_m" "187_cm" "1.71_m" "188_cm" "5.2"
[10] "5.5" "5.7" "155_cm" "5.4" "163_cm" "6.4" "170_cm" "5.7" "5.8"
[19] "186_cm" "5.1" "5.3" "5.3" "5.7" "5.8" "6.2" "175_cm" "5.6"
[28] "5.7" "180_cm" "5.6" "160_cm" "163_cm" "5.6" "163_cm" "5.7" "175_cm"
[37] "165_cm" "5.7" "5.6" "5.11" "188_cm" "5.6" "5.3" "5.5" "5.4"
[46] "5.6" "180_cm" "5.9" "165_cm" "5.6" "180_cm" "165_cm" "175_cm" "5.4"
[55] "167_cm" "175_cm" "5.7" "5.11" "5.11" "5.5" "6.1" "1.68_m" "5.4"
[64] "5.7" "5.3" "5.5" "5.9" "5.9" "5.4" "5.6" "5.8" "5.5"
[73] "5.9" "6.3" "6.1" "5.8" "5.2" "5.2" "6.0" "166_cm" "5.3"
[82] NA "166_cm" "1.88_m" "5.6" "5.10" "171_cm" "5.1" "170_cm" "178_cm"
[91] "5.2" "185_cm" "5.11" "5.9" "5.11" "5.7" "6.0" "6.1" "176_cm"
[100] "5.7" "189_cm" "5.3" "5.7" "164_cm" "5.6" "5.8" NA NA
[109] "175_cm" "157_cm" "5.10" "172_cm" "170_cm" "5.7" "5.8" "5.6" "169_cm"
[118] "6.2" "6.4" "1.71_m" "5.10" "1.67_m" "5.2" "160_cm" "5.8" "6.2"
[127] "5.5" "180_cm" "175_cm" "5.0" "195_cm" "5.5" "6.0" "175_cm"
Thank you
CodePudding user response:
Try this, where my vec
is your $Height
column:
U <- gsub("[0-9._]", "", vec)
head(U)
# [1] "" "cm" "" "cm" "m" "cm"
as.numeric(gsub("[^0-9.]", "", vec)) *
sapply(U, switch, m = 1, cm = 1/100, 1)
# cm cm m cm m cm cm cm cm cm
# 5.70 1.57 5.11 1.67 1.65 1.87 1.71 1.88 5.20 5.50 5.70 1.55 5.40 1.63 6.40 1.70 5.70 5.80 1.86 5.10 5.30 5.30 5.70 5.80
# cm cm cm cm cm cm cm cm cm
# 6.20 1.75 5.60 5.70 1.80 5.60 1.60 1.63 5.60 1.63 5.70 1.75 1.65 5.70 5.60 5.11 1.88 5.60 5.30 5.50 5.40 5.60 1.80 5.90
# cm cm cm cm cm cm m
# 1.65 5.60 1.80 1.65 1.75 5.40 1.67 1.75 5.70 5.11 5.11 5.50 6.10 1.68 5.40 5.70 5.30 5.50 5.90 5.90 5.40 5.60 5.80 5.50
# cm <NA> cm m cm cm cm cm
# 5.90 6.30 6.10 5.80 5.20 5.20 6.00 1.66 5.30 NA 1.66 1.88 5.60 5.10 1.71 5.10 1.70 1.78 5.20 1.85 5.11 5.90 5.11 5.70
# cm cm cm <NA> <NA> cm cm cm cm cm m
# 6.00 6.10 1.76 5.70 1.89 5.30 5.70 1.64 5.60 5.80 NA NA 1.75 1.57 5.10 1.72 1.70 5.70 5.80 5.60 1.69 6.20 6.40 1.71
# m cm cm cm cm cm
# 5.10 1.67 5.20 1.60 5.80 6.20 5.50 1.80 1.75 5.00 1.95 5.50 6.00 1.75
You can ignore the names, it's the values that are important.
unname(as.numeric(gsub("[^0-9.]", "", vec)) *
sapply(U, switch, m = 1, cm = 1/100, 1))
# [1] 5.70 1.57 5.11 1.67 1.65 1.87 1.71 1.88 5.20 5.50 5.70 1.55 5.40 1.63 6.40 1.70 5.70 5.80 1.86 5.10 5.30 5.30 5.70
# [24] 5.80 6.20 1.75 5.60 5.70 1.80 5.60 1.60 1.63 5.60 1.63 5.70 1.75 1.65 5.70 5.60 5.11 1.88 5.60 5.30 5.50 5.40 5.60
# [47] 1.80 5.90 1.65 5.60 1.80 1.65 1.75 5.40 1.67 1.75 5.70 5.11 5.11 5.50 6.10 1.68 5.40 5.70 5.30 5.50 5.90 5.90 5.40
# [70] 5.60 5.80 5.50 5.90 6.30 6.10 5.80 5.20 5.20 6.00 1.66 5.30 NA 1.66 1.88 5.60 5.10 1.71 5.10 1.70 1.78 5.20 1.85
# [93] 5.11 5.90 5.11 5.70 6.00 6.10 1.76 5.70 1.89 5.30 5.70 1.64 5.60 5.80 NA NA 1.75 1.57 5.10 1.72 1.70 5.70 5.80
# [116] 5.60 1.69 6.20 6.40 1.71 5.10 1.67 5.20 1.60 5.80 6.20 5.50 1.80 1.75 5.00 1.95 5.50 6.00 1.75
Walk-through:
as.numeric(gsub("[^0-9.]", "", vec))
extracts just the number componentsU
is the units extracted from each, where empty strings""
means there was no unit applied.switch(U[1], m = 1, cm = 1/100, 1)
would check the firstU
unit and return a conversion into meters; the trailing unnamed1
is the default assigned ifU[1]
is not one of the known strings"cm"
and"m"
, which we'll use as1
(feet).- because
switch
is not vectorized, I usesapply(U, switch, ...)
to vectorize its effect, and it returns a vector of multipliers to apply to the numbers extracted withas.numeric(.)
Data
vec <- c("5.7", "157_cm", "5.11", "167_cm", "1.65_m", "187_cm", "1.71_m", "188_cm", "5.2", "5.5", "5.7", "155_cm", "5.4", "163_cm", "6.4", "170_cm", "5.7", "5.8", "186_cm", "5.1", "5.3", "5.3", "5.7", "5.8", "6.2", "175_cm", "5.6", "5.7", "180_cm", "5.6", "160_cm", "163_cm", "5.6", "163_cm", "5.7", "175_cm", "165_cm", "5.7", "5.6", "5.11", "188_cm", "5.6", "5.3", "5.5", "5.4", "5.6", "180_cm", "5.9", "165_cm", "5.6", "180_cm", "165_cm", "175_cm", "5.4", "167_cm", "175_cm", "5.7", "5.11", "5.11", "5.5", "6.1", "1.68_m", "5.4", "5.7", "5.3", "5.5", "5.9", "5.9", "5.4", "5.6", "5.8", "5.5", "5.9", "6.3", "6.1", "5.8", "5.2", "5.2", "6.0", "166_cm", "5.3", NA, "166_cm", "1.88_m", "5.6", "5.10", "171_cm", "5.1", "170_cm", "178_cm", "5.2", "185_cm", "5.11", "5.9", "5.11", "5.7", "6.0", "6.1", "176_cm", "5.7", "189_cm", "5.3", "5.7", "164_cm", "5.6", "5.8", NA, NA, "175_cm", "157_cm", "5.10", "172_cm", "170_cm", "5.7", "5.8", "5.6", "169_cm", "6.2", "6.4", "1.71_m", "5.10", "1.67_m", "5.2", "160_cm", "5.8", "6.2", "5.5", "180_cm", "175_cm", "5.0", "195_cm", "5.5", "6.0", "175_cm")
CodePudding user response:
You can just select the rows that contain ft or in the Data_Original$Height
data you showed, select the rows where there is no _
and change only these rows.
Data_Original$Height[grepl(pattern = "_", Data_Original$Height) == F] <- round(as.numeric(Data_Original$Height[grepl(pattern = "_", Data_Original$Height) == F])*0.3048, 2)
And if you want to use the data with the 'ft' still labelled
Data_Original$Height[grepl(pattern = "ft", Data_Original$Height)] <- round(as.numeric(gsub("[^0-9.]", "", Data_Original$Height[grepl(pattern = "ft", Data_Original$Height)]))*0.3048, 2)
CodePudding user response:
Here is the tidyverse
way:
Data_original3 %>%
separate(Height, c('feet', 'inches'), "_ft_", convert = TRUE, remove = FALSE) %>%
mutate(Height = if_else(grepl("ft", Height), paste0(round((12*as.numeric(feet) inches)*2.54/100, digits = 2), "_m"), paste0(Height))) %>%
select(Height)
Using separate
, I split feet and inches into two columns, then perform the conversion calculation back into the original "Height" column.
Note: This only works if your still splitting by "_ft_"
instead of "."