I have below contents in csv file:
key1 key2 Key3 key4 key5
Val1 A 51 'True' 25
Val1 A 50 'False' 25
Val1 A 49 'True' 25
Val1 A 48 'True' 25
Val2 A 47 'False' 25
Val2 A 46 'True' 25
Val2 A 45 'False' 25
Val2 A 44 'True' 25
Val2 A 43 'True' 25
Output should be like:
key1 key2 max_key5 total_key4 total_true_key4 grade
Val1 A 51 4 3 1
Val2 A 47 5 3 2
I have to group by key1 and key2 then find the max of key5 and count of total rows of key4 and count of total true rows of key4 and then percentage.
What I am trying:
grd = "1 if avg > 80 else 2 if avg > 50 else c"
pct = lambda x: (1 if x > 80 else (2 if x > 50 else 3))
json_data
.assign(_key4=lambda df_: df_['key4'] == "'True'")
.groupby(['key1', 'key2'])
.agg(
maxkey5=('key5', 'max'),
total_key4=('key4', 'count'),
total_true_key4=('_key4', 'sum')
)
.eval('avg = (total_true_key4 * 100) / total_key4')
.eval('feg = grd')
#.apply(pct(avg))
eval to calculate percentage is working fine.. not able to do if else if on avg column
I don't want to apply separately like in another separate statement.
CodePudding user response:
how about this?
json_data
.assign(_key4=lambda df_: df_['key4'] == "'True'")
.groupby(['key1', 'key2'])
.agg(
maxkey5=('key5', 'max'),
total_key4=('key4', 'count'),
total_true_key4=('_key4', 'sum')
percentage=('key4', lambda x: int(sum(x)/len(x)*100))
)
CodePudding user response:
As mentioned in the comments, I am not quite sure how you would get your desired output with your logic. Also, you define maxkey5=('key5', 'max')
but actually you are getting the max
of key3
. So, if I am not mistaken, you are looking for something like this:
import pandas as pd
from numpy import mean
df = pd.DataFrame({
"key1": ["Val1"]*4 ["Val2"]*5,
"key2": ["A"]*9,
"key3": [51, 50, 49, 48, 47, 46, 45, 44, 43],
"key4": ["'True'", "'False'", "'True'", "'True'", "'False'", "'True'", "'False'", "'True'", "'True'"],
"key5": [25] * 9
})
(
df
.assign(total_true_key4=df['key4']=="'True'", grade=df['key4']=="'True'")
.groupby(["key1", "key2"])[["key3", "key4", "total_true_key4", "grade"]]
.agg(
{"key3": "max",
"key4": "count",
"total_true_key4": "sum",
"grade": lambda x: 1 if mean(x)>.8 else (2 if mean(x)>.5 else 3)}
)
.rename(columns={"key3":"max_key3", "key4":"total_key4"}).reset_index()
)
which results into the following data frame:
key1 key2 max_key3 total_key4 total_true_key4 grade
0 Val1 A 51 4 3 2
1 Val2 A 47 5 3 2