Filtering string in column (counts of string) plus average of second column-CodePudding

Looking for a way to filter a column "Role" for counts of string and get an average of 2nd col "Rank" for these values. I tried value counts and string.contains but do not know how to bring this together.

import pandas as pd
data = {'Role':['Big, Big, Guard, Guard, Forward', 'Big, Big, Guard, Guard, Forward', 'Big, Guard, Big, Guard, Guard', 'Big, Big, Guard, Forward, Big', 'Guard, Big, Guard, Guard, Big','Big, Big, Guard, Forward, Big' ],
        'Rank':[10, 6, 5, 2, 1, 3]} 
df = pd.DataFrame(data)
print(df)

   Role                               Rank
0  Big, Big, Guard, Guard, Forward    10
1  Big, Big, Guard, Guard, Forward     6
2    Big, Guard, Big, Guard, Guard     5
3    Big, Big, Guard, Forward, Big     2
4    Guard, Big, Guard, Guard, Big     1
5    Big, Big, Guard, Forward, Big     3

Idea of result filtering for "2* Big", .........

Role                   Value count    Rank/avg 
Big, Big               4               5.5 
Big, Big, Big          2               2.5

Just edited the values for two Bigs, original df is two big to add here. Output of 2Big and 3 Big is result wished for.

CodePudding user response：

Not exactly the same output format, but if what you are looking for is the Rank/avg, maybe that helps you:

data = {'Role':['Big, Big, Guard, Guard, Forward', 'Big, Big, Guard, Guard, Forward', 'Big, Guard, Big, Guard, Guard', 'Big, Big, Guard, Forward, Big', 'Guard, Big, Guard, Guard, Big','Big, Big, Guard, Forward, Big' ],
        'Rank':[10, 6, 5, 2, 1, 3]} 
df = pd.DataFrame(data)
df['Role'] = df['Role'].str.strip()
# Create another column to hold `Role` values as a list
df['Role2'] = df['Role'].str.split(', ')
# Count how many times each role appears in each `Role` value
role_counts = df['Role2'].apply(lambda x: pd.Series(x).value_counts())
# Merge with original dataframe
df2 = df.merge(role_counts, left_index=True, right_index=True)
# Calculate average by `Role` count
df2[role_counts.columns] = df2[role_counts.columns].fillna(0)
df2['Rank avg'] = 0
for col in role_counts:
    df2['Rank avg'] = df2.groupby(col)['Rank'].transform('mean')

print(df2)

Output:

-	Role	Rank	Role2	Big	Guard	Forward	Rank avg
0	Big, Big, Guard, Guard, Forward	10	[Big, Big, Guard, Guard, Forward]	2.0	2.0	1.0	5.25
1	Big, Big, Guard, Guard, Forward	6	[Big, Big, Guard, Guard, Forward]	2.0	2.0	1.0	5.25
2	Big, Guard, Big, Guard, Guard	5	[Big, Guard, Big, Guard, Guard]	2.0	3.0	0.0	3.00
3	Big, Big, Guard, Forward, Big	2	[Big, Big, Guard, Forward, Big]	3.0	1.0	1.0	5.25
4	Guard, Big, Guard, Guard, Big	1	[Guard, Big, Guard, Guard, Big]	2.0	3.0	0.0	3.00
5	Big, Big, Guard, Forward, Big	3	[Big, Big, Guard, Forward, Big]	3.0	1.0	1.0	5.25

CodePudding user response：

IIUC, you can first split and explode the strings, then perform a double groupby.agg:

(df
 .assign(Role=df['Role'].str.split(', '))
 .explode('Role')
 .reset_index()
 .groupby(['index', 'Role'], as_index=False)
 .agg({'Role': ','.join, 'Rank': 'first'})
 .groupby('Role')
 .agg(**{'Value count': ('Role', 'count'),
         'Rank_avg': ('Rank', lambda x: x.sum()/len(x))
        })
)

output:

                   Value count  Rank_avg
Role                                    
Big,Big                      4      5.50
Big,Big,Big                  2      2.50
Forward                      4      5.25
Guard                        2      2.50
Guard,Guard                  2      8.00
Guard,Guard,Guard            2      3.00