So this is the function i am trying to build. It should calculate the percentage of a col of one Df on another based on same IDs.
# With dummy data
df1 = data.frame(State = c('Arizona AZ','Georgia GG', 'Newyork NY','Indiana IN','Florida FL'), Score=c(62,47,55,74,31), id=c(1,2,3,4,5))
df1
> df1
State Score id
1 Arizona AZ 62 1
2 Georgia GG 47 2
3 Newyork NY 55 3
4 Indiana IN 74 4
5 Florida FL 31 5
df2 = data.frame(State = c('Arizona AZ','Georgia GG', 'Newyork NY','Indiana IN'), Score2=c(10,7,5,4), id=c(1,2,3,4))
df2
> df2
State Score2 id
1 Arizona AZ 10 1
2 Georgia GG 7 2
3 Newyork NY 5 3
4 Indiana IN 4 4
CalcPerc <- function(x, ins) {
# 1) Subset cbind
y <- subset(ins, id %in% x$id)
y <- cbind(y, x$Score)
# Percentage
x1 <- 100*(y$Score2/y$Score)
print(x1)
}
CalcPerc(x= df2, ins = df1)
[1] 4
numeric(0)
Why am i getting numeric(0) ?
How can i fix my function?
It works just fine if i am doing it outside a function.
Thanks for your help!
CodePudding user response:
Try adding a browser()
statement right before print(x1)
and run CalcPerc(x= df2, ins = df1)
.
You will see that y
is
State Score id x$Score
1 Arizona AZ 62 1 10
2 Georgia GG 47 2 7
3 Newyork NY 55 3 5
4 Indiana IN 74 4 4
This is why referring to y$Score
gives an empty vector -- there is not such column.
I suspect that you actually want is to merge the two dataframes.
With base R:
CalcPerc <- function(x, ins) {
# 1) Subset cbind
y <- subset(ins, id %in% x$id)
z <- merge(x, y, by = c('State', 'id'))
x1 <- 100*(z$Score2/z$Score)
print(x1)
}
CodePudding user response:
Try this:
CalcPerc <- function(x, ins) {
# 1) Subset cbind
y <- subset(ins, id %in% x$id)
y$Score2 = x$Score2
x1 <- 100*(y$Score2/y$Score)
print(x1)
}
> CalcPerc(x= df2, ins = df1)
[1] 16.129032 14.893617 9.090909 5.405405
The answer will be in the right order
CodePudding user response:
@robertdj and @Necklondon fixed your error. If you want a dplyr
option you can join your data based on the id and state and mutate
a column that calculates the percentage, So you immediately see to what state the percentage corresponds in a dataframe:
library(dplyr)
df1 %>%
left_join(df2, by = c("id", "State")) %>%
mutate(Perc = 100*(Score2/Score))
Output:
State Score id Score2 Perc
1 Arizona AZ 62 1 10 16.129032
2 Georgia GG 47 2 7 14.893617
3 Newyork NY 55 3 5 9.090909
4 Indiana IN 74 4 4 5.405405
5 Florida FL 31 5 NA NA