Looking for a way to filter a column "Role" for counts of string and get an average of 2nd col "Rank" for these values. I tried value counts and string.contains but do not know how to bring this together.
import pandas as pd
data = {'Role':['Big, Big, Guard, Guard, Forward', 'Big, Big, Guard, Guard, Forward', 'Big, Guard, Big, Guard, Guard', 'Big, Big, Guard, Forward, Big', 'Guard, Big, Guard, Guard, Big','Big, Big, Guard, Forward, Big' ],
'Rank':[10, 6, 5, 2, 1, 3]}
df = pd.DataFrame(data)
print(df)
df
Role Rank
0 Big, Big, Guard, Guard, Forward 10
1 Big, Big, Guard, Guard, Forward 6
2 Big, Guard, Big, Guard, Guard 5
3 Big, Big, Guard, Forward, Big 2
4 Guard, Big, Guard, Guard, Big 1
5 Big, Big, Guard, Forward, Big 3
Idea of result filtering for "2* Big", .........
Role Value count Rank/avg
Big, Big 4 5.5
Big, Big, Big 2 2.5
Just edited the values for two Bigs, original df is two big to add here. Output of 2Big and 3 Big is result wished for.
CodePudding user response:
Not exactly the same output format, but if what you are looking for is the Rank/avg
, maybe that helps you:
data = {'Role':['Big, Big, Guard, Guard, Forward', 'Big, Big, Guard, Guard, Forward', 'Big, Guard, Big, Guard, Guard', 'Big, Big, Guard, Forward, Big', 'Guard, Big, Guard, Guard, Big','Big, Big, Guard, Forward, Big' ],
'Rank':[10, 6, 5, 2, 1, 3]}
df = pd.DataFrame(data)
df['Role'] = df['Role'].str.strip()
# Create another column to hold `Role` values as a list
df['Role2'] = df['Role'].str.split(', ')
# Count how many times each role appears in each `Role` value
role_counts = df['Role2'].apply(lambda x: pd.Series(x).value_counts())
# Merge with original dataframe
df2 = df.merge(role_counts, left_index=True, right_index=True)
# Calculate average by `Role` count
df2[role_counts.columns] = df2[role_counts.columns].fillna(0)
df2['Rank avg'] = 0
for col in role_counts:
df2['Rank avg'] = df2.groupby(col)['Rank'].transform('mean')
print(df2)
Output:
- | Role | Rank | Role2 | Big | Guard | Forward | Rank avg |
---|---|---|---|---|---|---|---|
0 | Big, Big, Guard, Guard, Forward | 10 | [Big, Big, Guard, Guard, Forward] | 2.0 | 2.0 | 1.0 | 5.25 |
1 | Big, Big, Guard, Guard, Forward | 6 | [Big, Big, Guard, Guard, Forward] | 2.0 | 2.0 | 1.0 | 5.25 |
2 | Big, Guard, Big, Guard, Guard | 5 | [Big, Guard, Big, Guard, Guard] | 2.0 | 3.0 | 0.0 | 3.00 |
3 | Big, Big, Guard, Forward, Big | 2 | [Big, Big, Guard, Forward, Big] | 3.0 | 1.0 | 1.0 | 5.25 |
4 | Guard, Big, Guard, Guard, Big | 1 | [Guard, Big, Guard, Guard, Big] | 2.0 | 3.0 | 0.0 | 3.00 |
5 | Big, Big, Guard, Forward, Big | 3 | [Big, Big, Guard, Forward, Big] | 3.0 | 1.0 | 1.0 | 5.25 |
CodePudding user response:
IIUC, you can first split
and explode
the strings, then perform a double groupby.agg
:
(df
.assign(Role=df['Role'].str.split(', '))
.explode('Role')
.reset_index()
.groupby(['index', 'Role'], as_index=False)
.agg({'Role': ','.join, 'Rank': 'first'})
.groupby('Role')
.agg(**{'Value count': ('Role', 'count'),
'Rank_avg': ('Rank', lambda x: x.sum()/len(x))
})
)
output:
Value count Rank_avg
Role
Big,Big 4 5.50
Big,Big,Big 2 2.50
Forward 4 5.25
Guard 2 2.50
Guard,Guard 2 8.00
Guard,Guard,Guard 2 3.00