Problems generating output table due to different column names-CodePudding

I would like to solve some problems with the column name, which corroborates with errors when executing a code. Here, I'll show you a simple example. Note that I have a column called TimeofCalculate and the code below is Timeofcalculate, which gives an error, because the code is calculate instead of Calculate. However, I would like any of them worked in the code. Also, I have a database which is Timeofcalculâte column. This â is common where I live. Therefore, I would like to resolve these mentioned issues.

library(dplyr)


Test <- structure(list(date1 = as.Date(c("2021-11-01","2021-11-01","2021-11-01","2021-11-01")),
                       date2 = as.Date(c("2021-10-22","2021-10-22","2021-10-28","2021-10-30")),
                       Week = c("Friday", "Friday", "Thursday", "thursday"),
                       Category = c("FDE", "FDE", "FDE", "FDE"),
                       TimeofCalculate = c(4, 6, 6, 3)), class = "data.frame",row.names = c(NA, -4L))


Test %>%
  group_by(Week = tools::toTitleCase(Week)) %>% 
  summarise(Time=mean(Timeofcalculate), .groups = 'drop')

CodePudding user response：

Perhaps we can take advantage of tidyselect::matches.

library(dplyr)

nms <- c('TimeofCalculate|Timeofcalculate|Timeofcalculâte')

#alternative one
Test %>%
  group_by(Week = tools::toTitleCase(Week)) %>% 
  summarise(across(matches(nms), mean), .groups = 'drop')
#> # A tibble: 2 × 2
#>   Week     TimeofCalculate
#>   <chr>              <dbl>
#> 1 Friday               5  
#> 2 Thursday             4.5

#using a purrr style lambda
Test %>%
  group_by(Week = tools::toTitleCase(Week)) %>% 
  summarise(across(matches(nms), ~mean(., na.rm = TRUE)), .groups = 'drop')
#> # A tibble: 2 × 2
#>   Week     TimeofCalculate
#>   <chr>              <dbl>
#> 1 Friday               5  
#> 2 Thursday             4.5

#this will also work

Test %>%
  group_by(Week = tools::toTitleCase(Week)) %>%
  summarise(across(any_of(c("Timeofcalculate", "TimeofCalculate", "Timeofcalculâte")), ~ mean(., na.rm = TRUE)), .groups = "drop")

^{Created on 2021-12-26 by the reprex package (v2.0.1)}

CodePudding user response：

I think weekdays in different spellings are unacceptable in a data base, first fix this. We may use built-in tools::toTitleCase to make first letters upper-case.

Test <- transform(Test, Week=tools::toTitleCase(Week))

Then, we may easily aggregate by column numbers, so names needed.

aggregate(list(Time=Test[, 5]), list(Week=Test[, 3]), mean)
#       Week Time
# 1   Friday  5.0
# 2 Thursday  4.5

If it's a problem to hard-code column indices by hand, we may use agrep which identifies via string distance matching the index of the most similar column name.

c_tcalc <- agrep('timeofcalculate', names(Test))
c_week <- agrep('week', names(Test))

aggregate(list(Time=Test[, c_tcalc]), list(Week=Test[, c_week]), mean)
#       Week Time
# 1   Friday  5.0
# 2 Thursday  4.5

Data:

Test <- structure(list(date1 = structure(c(18932, 18932, 18932, 18932
), class = "Date"), date2 = structure(c(18922, 18922, 18928, 
18930), class = "Date"), Week = c("Friday", "Friday", "Thursday", 
"Thursday"), Category = c("FDE", "FDE", "FDE", "FDE"), TimeofCalculate = c(4, 
6, 6, 3)), class = "data.frame", row.names = c(NA, -4L))