I am quite new to R so eggscuse my lack of ability. I have tried and failed a fair bit, and would appreciate any input.
I am asked to get rid of inconsistent use of "." and "," to indicate decimals by multiplying every number in certain columns by some multiple of 10. I have tried to simply multiply using the binary operator * but it obviously doesnt work as some columns are factors, which is required in this case.
I have tried using this code aswell but get erros :subscript "Var" cant be "NA"
data %>% mutate_if(is.numeric, ~ . * 1000)
Below is the code I have for my dataset
datat <- c("Starting_year" , "Rank" , "Team" , "Home_total_Games", "Home_Total_Attendance" , "Home_Avg_Attendance" , "Home_capacity" , "Away_Total_Attendance" , "Away_Avg_Attendance" , "Away_Capacity")
names(data) <- datat
Factors assigned
data$Rank <- as.factor(data$Rank)
data$Starting_year <- as.factor(data$Starting_year)
Thanks in advance
Cant embed but there is a picture below of the data. I am asked to use a function in dplyr to multiply the columns by 1000 to remove all the . and ,
CodePudding user response:
What is the format of numbers?
If the format is: 1.000.000,5
, where .
is a thousand separator, while ,
is a decimal separator, just use gsub:
foo = "1.000.000,5"
bar = gsub("\\.", "", foo) # "1000000,5"
baz = gsub(",", "\\.", bar) # "1000000.5"
as.numeric(baz)
In this case, factor
is not a problem because gsub
will de-factor the vector.
If you need to multiply the numbers after that, it is not a problem. Transform this into a function (such as convert_decimal
) and apply it to columns you want:
data$column = convert_decimal(data$column)
For multiple selected columns (let's call the vector of names selection
):
data[selection] = lapply(data[selection], convert_decimal)
CodePudding user response:
Using @Colombo's example, another option is to use readr::parse_number
and defining a locale
.
foo <- "1.000.000,5"
x <- readr::parse_number(
foo, locale = readr::locale(decimal_mark = ",", grouping_mark = "."))
x
#[1] 1e 06
You could also define a global locale
for your particular analysis that ensures that all numbers are parsed consistently. Obviously this assumes that number formatting is consistent.
BTW, you can verify that x
indeed includes the fractional .5 if you do sprintf("%.1f", x)
.