Home > OS >  modular filter and mutate based on user input R
modular filter and mutate based on user input R

Time:07-24

I'd like to make a filter-mutate data manipulation based on user input/global variable.

The idea is to set a filter number and by that filter number to filter and run the mutations on a given dataframe.

I have the following filter list:

filter_list <- list(
                    ymsd  ,
                    ymsdc ,
                    ymsdt ,
                    ymsdct,
                    yms   ,
                    ym    ,
                    s     ,
                    ymst  ,
                    ymt   ,
                    st    )

Now I am trying to mutate as following:

assign("tosum",colnames(.GlobalEnv$sheet[9]))
sheet <- sheet %>%
        group_by_at(unlist(.GlobalEnv$filter_list[2])) %>%
        mutate(
          total = sum(.GlobalEnv$tosum)    
          )

(Note: sheet is a dataframe from a selected sheet from an excel file with multiple sheets)

As you can see, I am inserting the filter by which to filter from .GlobalEnv$filter_list[2]. From testing, it seems that this approach works for grouping.

Problem is I can't get the column to sum by the grouping. (Any other way of calling the column name ended with a sum of the entire column and not a sum by group). I know cumsums and this will not help me in this case since I require different calculations to be made on the columns on different filters.

(I have 3 dataframes and a lot of variables to run through)

If it will help, these are the filters:

      # seletct filter <- give option to select filter
ymsd    <- c("year", "month", "site", "depth")
      # 1  date, site, depth -> this is TRANSECT
ymsdc   <- c("year", "month", "site", "depth", "ncore")
      # 2  date, site, depth, core -> core
ymsdt   <- c("year", "month", "site", "depth", "sample")
      # 3  date, site, depth, type -> TRANSECT/SAMPLE_TYPE (Biomass)
ymsdct  <- c("year", "month", "site", "depth", "ncore", "sample")
      # 4  date, site, depth, core, type -> CORE/SAMPLE_TYPE (Biomass)
yms     <- c("year", "month", "site")
      # 5  date, site -> Merge transect keep site - see change between sites along time only
ym      <- c("year", "month")
      # 6  date -> merge sites keep date - see change between seasons only
s       <- c("site")
      # 7  site -> see change between site only
ymst    <- c("year", "month", "site", "sample")
      # 8  date, site, type -> see change between sites along season with sample type (biomass)
ymt     <- c("year", "month",  "sample")
      # 9  date, type -> see change along seasons with type (biomass)
st      <- c("site", "sample")
      # 10 site, type -> see change between sites with sample type (biomass)

Also, if it's any help is a little head of one of the sheets:

date year month site depth ncore type DW BW
<dttm> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
2021-10-19 00:00:00 2021 Oct SB 12 1 BG 0.441 0.209
2021-10-19 00:00:00 2021 Oct SB 12 1 AG 0.306 0.142
2021-10-19 00:00:00 2021 Oct SB 12 1 DT 2.19 1.65

CodePudding user response:

It may be a bad practise to assign objects in the global environment especially the column names. We may need to either convert the string element to symbol and evaluate (!!) i.e. sum(!! rlang::sym(.GlobalEnv$tosum)) or make use of .data pronoun to extract ([[) the column value

library(dplyr)
assign("tosum",colnames(.GlobalEnv$sheet[9]))
sheet <- sheet %>%
        group_by(across(all_of(.GlobalEnv$filter_list[[2]]))) %>%
        mutate(
          total = sum(.data[[.GlobalEnv$tosum]], na.rm = TRUE)    
          )
sheet
# A tibble: 3 × 10
  date                 year month site  depth ncore type     DW    BW total
  <dttm>              <int> <chr> <chr> <int> <int> <chr> <dbl> <dbl> <dbl>
1 2021-10-19 00:00:00  2021 Oct   SB       12     1 BG    0.441 0.209  2.00
2 2021-10-19 00:00:00  2021 Oct   SB       12     1 AG    0.306 0.142  2.00
3 2021-10-19 00:00:00  2021 Oct   SB       12     1 DT    2.19  1.65   2.00

data

sheet <- structure(list(date = structure(c(1634616000, 1634616000, 1634616000
), class = c("POSIXct", "POSIXt"), tzone = ""), year = c(2021L, 
2021L, 2021L), month = c("Oct", "Oct", "Oct"), site = c("SB", 
"SB", "SB"), depth = c(12L, 12L, 12L), ncore = c(1L, 1L, 1L), 
    type = c("BG", "AG", "DT"), DW = c(0.441, 0.306, 2.19), BW = c(0.209, 
    0.142, 1.65)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

-reproducible example with iris data

data(iris)
tosum <- "Sepal.Length"
iris %>%
  group_by(Species) %>% 
  summarise(total = sum(.data[[.GlobalEnv$tosum]], na.rm = TRUE))
# A tibble: 3 × 2
  Species    total
  <fct>      <dbl>
1 setosa      250.
2 versicolor  297.
3 virginica   329.
  • Related