I would like to solve some problems with the column name, which corroborates with errors when executing a code. Here, I'll show you a simple example. Note that I have a column called TimeofCalculate
and the code below is Timeofcalculate
, which gives an error, because the code is calculate
instead of Calculate
. However, I would like any of them worked in the code. Also, I have a database which is Timeofcalculâte
column. This â
is common where I live. Therefore, I would like to resolve these mentioned issues.
library(dplyr)
Test <- structure(list(date1 = as.Date(c("2021-11-01","2021-11-01","2021-11-01","2021-11-01")),
date2 = as.Date(c("2021-10-22","2021-10-22","2021-10-28","2021-10-30")),
Week = c("Friday", "Friday", "Thursday", "thursday"),
Category = c("FDE", "FDE", "FDE", "FDE"),
TimeofCalculate = c(4, 6, 6, 3)), class = "data.frame",row.names = c(NA, -4L))
Test %>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarise(Time=mean(Timeofcalculate), .groups = 'drop')
CodePudding user response:
Perhaps we can take advantage of tidyselect::matches
.
library(dplyr)
nms <- c('TimeofCalculate|Timeofcalculate|Timeofcalculâte')
#alternative one
Test %>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarise(across(matches(nms), mean), .groups = 'drop')
#> # A tibble: 2 × 2
#> Week TimeofCalculate
#> <chr> <dbl>
#> 1 Friday 5
#> 2 Thursday 4.5
#using a purrr style lambda
Test %>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarise(across(matches(nms), ~mean(., na.rm = TRUE)), .groups = 'drop')
#> # A tibble: 2 × 2
#> Week TimeofCalculate
#> <chr> <dbl>
#> 1 Friday 5
#> 2 Thursday 4.5
#this will also work
Test %>%
group_by(Week = tools::toTitleCase(Week)) %>%
summarise(across(any_of(c("Timeofcalculate", "TimeofCalculate", "Timeofcalculâte")), ~ mean(., na.rm = TRUE)), .groups = "drop")
Created on 2021-12-26 by the reprex package (v2.0.1)
CodePudding user response:
I think weekdays in different spellings are unacceptable in a data base, first fix this. We may use built-in tools::toTitleCase
to make first letters upper-case.
Test <- transform(Test, Week=tools::toTitleCase(Week))
Then, we may easily aggregate
by column numbers, so names needed.
aggregate(list(Time=Test[, 5]), list(Week=Test[, 3]), mean)
# Week Time
# 1 Friday 5.0
# 2 Thursday 4.5
If it's a problem to hard-code column indices by hand, we may use agrep
which identifies via string distance matching the index of the most similar column name.
c_tcalc <- agrep('timeofcalculate', names(Test))
c_week <- agrep('week', names(Test))
aggregate(list(Time=Test[, c_tcalc]), list(Week=Test[, c_week]), mean)
# Week Time
# 1 Friday 5.0
# 2 Thursday 4.5
Data:
Test <- structure(list(date1 = structure(c(18932, 18932, 18932, 18932
), class = "Date"), date2 = structure(c(18922, 18922, 18928,
18930), class = "Date"), Week = c("Friday", "Friday", "Thursday",
"Thursday"), Category = c("FDE", "FDE", "FDE", "FDE"), TimeofCalculate = c(4,
6, 6, 3)), class = "data.frame", row.names = c(NA, -4L))