Whats wrong with my function? Calculation of Percentage with matching IDs-CodePudding

So this is the function i am trying to build. It should calculate the percentage of a col of one Df on another based on same IDs.

# With dummy data 

df1 = data.frame(State = c('Arizona AZ','Georgia GG', 'Newyork NY','Indiana IN','Florida FL'), Score=c(62,47,55,74,31), id=c(1,2,3,4,5))
df1

> df1
       State Score id
1 Arizona AZ    62  1
2 Georgia GG    47  2
3 Newyork NY    55  3
4 Indiana IN    74  4
5 Florida FL    31  5

df2 = data.frame(State = c('Arizona AZ','Georgia GG', 'Newyork NY','Indiana IN'), Score2=c(10,7,5,4), id=c(1,2,3,4))
df2

> df2
       State Score2 id
1 Arizona AZ     10  1
2 Georgia GG      7  2
3 Newyork NY      5  3
4 Indiana IN      4  4

CalcPerc <- function(x, ins) {
  
  # 1) Subset   cbind
  y  <- subset(ins, id %in% x$id)
  y  <- cbind(y, x$Score)
  
  # Percentage
  x1 <- 100*(y$Score2/y$Score)
  
  print(x1)
}

CalcPerc(x= df2, ins = df1)

[1] 4
numeric(0)

Why am i getting numeric(0) ?

How can i fix my function?

It works just fine if i am doing it outside a function.

Thanks for your help!

CodePudding user response：

Try adding a browser() statement right before print(x1) and run CalcPerc(x= df2, ins = df1). You will see that y is

       State Score id x$Score
1 Arizona AZ    62  1      10
2 Georgia GG    47  2       7
3 Newyork NY    55  3       5
4 Indiana IN    74  4       4

This is why referring to y$Score gives an empty vector -- there is not such column. I suspect that you actually want is to merge the two dataframes. With base R:

CalcPerc <- function(x, ins) {
    
    # 1) Subset   cbind
    y  <- subset(ins, id %in% x$id)
    
    z <- merge(x, y, by = c('State', 'id'))
    
    x1 <- 100*(z$Score2/z$Score)
    
    print(x1)
}

CodePudding user response：

Try this:

CalcPerc <- function(x, ins) {
      # 1) Subset   cbind
      y  <- subset(ins, id %in% x$id)
      y$Score2 = x$Score2
      x1 <- 100*(y$Score2/y$Score)
      print(x1)
   }
   > CalcPerc(x= df2, ins = df1)
   [1] 16.129032 14.893617  9.090909  5.405405

The answer will be in the right order

CodePudding user response：

@robertdj and @Necklondon fixed your error. If you want a dplyr option you can join your data based on the id and state and mutate a column that calculates the percentage, So you immediately see to what state the percentage corresponds in a dataframe:

library(dplyr)
df1 %>%
  left_join(df2, by = c("id", "State")) %>%
  mutate(Perc = 100*(Score2/Score))

Output:

       State Score id Score2      Perc
1 Arizona AZ    62  1     10 16.129032
2 Georgia GG    47  2      7 14.893617
3 Newyork NY    55  3      5  9.090909
4 Indiana IN    74  4      4  5.405405
5 Florida FL    31  5     NA        NA