I am trying to create a function that will use the query method to filter through a data frame based on multiple conditions and create a new series with the values applicable.
Here is an example of the df that I am working with:
Week | Treatment | Value |
---|---|---|
0 | ABC | 100 |
1 | ABC | 150 |
2 | ABC | 149 |
0 | XYZ | 350 |
1 | XYZ | 500 |
2 | XYZ | 600 |
0 | ABC | 101 |
1 | ABC | 130 |
2 | ABC | 147 |
I have been able to successfully filter my data for a given 'Week' and 'Treatment' using the .query method like this:
test = df.query('Week == 1 & Treatment.str.startswith("A").values')['value']
I am only interested in extracting the 'values' column.
This is great, but I don't want to have to copy and paste this line of code for every 'week' and 'treatment' in my df. Therefore, I want to create a function that will let me identify the desired week and treatment (or first letter of treatment) as arguments and have it return an object with my desired values for further analysis.
Here is what I have so far:
def sorter(df,week):
return df.query('Week == 0 & Treatment.str.startswith("A").values')['value']
I know that as I have it, my function does not return my values as an object. I will work on that. Where I am stuck is how to have one of the arguments for my function be a week (like '0') when in the query method the week is written as part of a string.
I tried this:
def sorter(df,week):
return df.query('Week == week & Treatment.str.startswith("A").values')['value']
But I got an error saying week was undefined.
CodePudding user response:
A possible solution, based on f strings
:
def sorter(df,week):
return df.query(f'Week == {week} & Treatment.str.startswith("A").values')['Value']
CodePudding user response:
As an alternative, you can define variables using @:
def sorter(df,week):
return df.query('Week == @week & Treatment.str.startswith("A").values')['value']