I am trying to wrap a function that can extract values from a data frame based on given parameters:
> f <- function(df, factor, level1, level2, response) {
df1 <- df[df$factor == level1, ]$response
df2 <- df[df$factor == level2, ]$response
print(df1)
print(df2)
}
> f(ToothGrowth, supp, "VC", "OJ", len)
NULL
NULL
In the above example, the data frame is ToothGrowth
, and the function f
is trying to print two separated data frames: df1
contains an array of values from rows that supp
column is equal to "VC"
; and "OJ" for df2
.
The equivalent python code that uses pandas is as follows (very straightforward and works well):
def f(df: pd.DataFrame, factor: str, level1: str, level2: str, response: str):
p1 = df[df[factor] == level1][response]
p2 = df[df[factor] == level2][response]
print(p1, p2)
Why are these outputs just NULL
in R? What would be the best practice here to achieve the goal?
I am new to R but aware of the tidyverse
. However, it seems very strange to me, and don't really know if it is worthy to achieve this purpose to use such a heavy library.
CodePudding user response:
I am quite sure that you need something like this:
f <- function(df, factor, level, response) {
df <- df %>%
dplyr::filter({{factor}}==level) %>%
pull({{response}})
print(df)
}
f(ToothGrowth, supp, "VC", len)
#gives:
[1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 11.2 5.2 7.0 16.5 16.5 15.2 17.3 22.5
[16] 17.3 13.6 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5
Explanation
What you produce with your function and the function call
f(ToothGrowth, "supp", "VC", "OJ", "len")
is:
[1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 11.2 5.2 7.0 16.5 16.5 15.2 17.3 22.5
[16] 17.3 13.6 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5
[1] 15.2 21.5 17.6 9.7 14.5 10.0 8.2 9.4 16.5 9.7 19.7 23.3 23.6 26.4 20.0
[16] 25.2 25.8 21.2 14.5 27.3 25.5 26.4 22.4 24.5 24.8 30.9 26.4 27.3 29.4 23.0
If this is your aim (what I don't think) then you could do:
ToothGrowth$len
gives:
[1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 11.2 5.2 7.0 16.5 16.5 15.2 17.3 22.5
[16] 17.3 13.6 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5
[31] 15.2 21.5 17.6 9.7 14.5 10.0 8.2 9.4 16.5 9.7 19.7 23.3 23.6 26.4 20.0
[46] 25.2 25.8 21.2 14.5 27.3 25.5 26.4 22.4 24.5 24.8 30.9 26.4 27.3 29.4 23.0
What makes more sense to me is that you need maybe something like this:
f <- function(df, factor, level) {
df <- df %>%
dplyr::filter({{factor}}==level)
print(df)
}
f(ToothGrowth, supp, "VC")
What this function does is:
from any dataframe you want identify a factor column in our case
supp
(due to tidy evaluation no "".filter the dataframe by the factor column identifying the factor level. I understand you may want to identify two levels. But this is easyl done by extending the filter line. In the function here we need "".
We get:
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 VC 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
11 16.5 VC 1.0
12 16.5 VC 1.0
13 15.2 VC 1.0
14 17.3 VC 1.0
15 22.5 VC 1.0
16 17.3 VC 1.0
17 13.6 VC 1.0
18 14.5 VC 1.0
19 18.8 VC 1.0
20 15.5 VC 1.0
21 23.6 VC 2.0
22 18.5 VC 2.0
23 33.9 VC 2.0
24 25.5 VC 2.0
25 26.4 VC 2.0
26 32.5 VC 2.0
27 26.7 VC 2.0
28 21.5 VC 2.0
29 23.3 VC 2.0
30 29.5 VC 2.0
And this is extendable as shown at the beginning of my answer: For example if we want to get the vector:
just add pull(response)
to your function after adding response to the arguments, etc...
CodePudding user response:
R and Python have similarities and disimilarities. Your function in R should be writen like this:
f <- function(df, factor, level1, level2, response) {
df1 <- df[df[,factor] == level1, ][,response]
df2 <- df[df[,factor] == level2, ][,response]
print(df1)
print(df2)
}
When you write df$factor
, you are saying that there is a column named "factor" instead of passing the value of (the object) factor
as a name of a column.
And, as @rawr said, you have to quote the calling:
f(ToothGrowth, "supp", "VC", "OJ", "len")
This is because your are indicating the name of the columns (variables), so you are indicating names (strings), so they must be set between quotes.