I want to count values in one dataframe matching the key from another.
What do I have:
df_a
df_a = pd.DataFrame(data = {'keys':['key1', 'key2', 'key3'], 'total':[0, 0, 0], '>5':''})
df_a
output:
keys total >5
0 key1 0
1 key2 0
2 key3 0
df_b
df_b = pd.DataFrame(data = {'keys':['key1', 'key1', 'key2', 'key2', 'key3'], 'value':[3, 7, 8, 4, 10]})
df_b
output:
keys value
0 key1 3
1 key1 7
2 key2 8
3 key2 4
4 key3 10
What do I expect:
I want to count, how many values in df_b
match to each key from df_a
. Also, I want to count, which part of them are more than 5.
As a result, I want to fill my df_a
like this:
keys total >5
0 key1 2 0.5
1 key2 2 0.5
2 key3 1 1.0
What have I done
I've tried to iterate over column keys
in df_b
and keys
in df_a
, check if it matches and use local index than. But it's finished with an error.
for key_a in df_a['keys']:
for key_b in df_b['keys']:
if key_a == key_b:
row_index = df_b.iloc['keys'][key_b]
df_a['total'][row_index] = 1
output:
TypeError: Cannot index by location index with a non-integer key
I know that it's very stupid way of solving my problem. Can you help me, please? What should I do to make it work correctly?
CodePudding user response:
Create new column by compare greater like 5
, then aggregate size
and mean
and join to df_a
:
df = (df_b.assign(tmp = df_b['value'].gt(5))
.groupby('keys')
.agg(**{'total':('tmp','size'),'>5':('tmp','mean')}))
print (df)
total >5
keys
key1 2 0.5
key2 2 0.5
key3 1 1.0
df = df_a[['keys']].join(df, on='keys')
print (df)
keys total >5
0 key1 2 0.5
1 key2 2 0.5
2 key3 1 1.0
CodePudding user response:
Remove '>5' and 'total' from df_a. that's unnecessary.
df_b['>5'] = df_b['value'].apply(lambda x: 1 if x > 5 else None)
aggs = {'keys': 'first', 'value': 'count', '>5': 'count'}
df_b_new = df_b.groupby('keys').agg(aggs).reset_index(drop=True)
df = pd.merge(df_a, df_b_new, on='keys', how='left')
df['>5'] = df['>5'] / df['value']
OUTPUT:
keys value >5
0 key1 2 0.5
1 key2 2 0.5
2 key3 1 1.0
EDIT: to change column name from value to Total
df.rename(columns={'value': 'total'}, inplace=True)