Home > database >  R unused argument error in homemade function using stats package
R unused argument error in homemade function using stats package

Time:08-24

Edited to clarify dataset:

I am trying to write a function so I can calculate a weighted mean and median for multiple numeric variables, and show them together.

I have two versions of the dataset that I want to be able to run the function on for comparison.

Ind1 is a series of monetary values, ranging from 0 to 2000. The weights are decimal values between 0 and 4, and this needs specifying as different variables use different weights.

The vectors Ind1 and firstwt below are part of the larger data frame dataset_a - I have shown them to give an example of the actual data. Each run of the function will use a different indicator variable, and the relevant weight for the variable.

library(tidyverse)
library(spatstat)

#Sample data
Ind1 = (0, 0, 290.50, 100, 0, 150.00, 2000.00, 1350.50, 320.00, 30.00)
firstwt = (0.974, 2.11, 1.81, 0.817, 3.85, 2.33, 1.41, 1.37, 1.83, 1.57)

summary_stats <- function(indicator, weight){
  summarise(mean = stats::weighted.mean(indicator, weight, na.rm = TRUE),
           median = weighted.median(indicator, weight, na.rm = TRUE))
}
dataset_a %>%
summary_stats(Ind1, weight = firstwt)

Error in summary_stats(., Ind1, weight = firstwt) : unused argument (Ind1)

I have also tried specifying the dataset as the first item in the function, then piping it to the summarise, so summary_stats <- function(data, indicator, weight){ data %>% summarise... but I get an unused argument error for the dataset if I do that.

CodePudding user response:

The first argument to summarize is the data frame and that is missing. If we add that then it works. Using the same setup as in the question this creates a data frame for use in the first argument of summarize. It pipes that data frame to summarize which has the effect of inserting it in the first argument of summarize.

summary_stats2 <- function(indicator, weight) {
  data.frame(indicator, weight) %>%
    summarise(mean = stats::weighted.mean(indicator, weight, na.rm = TRUE),
              median = weighted.median(indicator, weight, na.rm = TRUE))
}

summary_stats2(Ind1, weight = firstwt)
##       mean median
## 1 346.4053     65

Even easier is this solution which does not use dplyr:

summary_stats3 <- function(indicator, weight) {
    data.frame(mean = stats::weighted.mean(indicator, weight, na.rm = TRUE),
               median = weighted.median(indicator, weight, na.rm = TRUE))
}

summary_stats3(Ind1, weight = firstwt)
##       mean median
## 1 346.4053     65

Update

In the comments the poster stated that the setup actually wanted is one where the variables are in a data frame so let change the problem and answer to this.

library(dplyr)
library(spatstat)

DF <- data.frame(Ind1 = c(0, 0, 290.50, 100, 0, 150.00, 2000.00, 
                        1350.50, 320.00, 30.00),
                 firstwt = c(0.974, 2.11, 1.81, 0.817, 3.85, 2.33, 
                           1.41, 1.37, 1.83, 1.57))

summary_stats4 <- function(data, indicator, weight){
  data %>% summarise(
    mean = stats::weighted.mean({{indicator}}, {{weight}}, na.rm = TRUE),
    median = weighted.median({{indicator}}, {{weight}}, na.rm = TRUE))
}

DF %>% summary_stats4(Ind1, firstwt)
  • Related