Home > Blockchain >  Filtering string in column (counts of string) plus average of second column
Filtering string in column (counts of string) plus average of second column

Time:03-25

Looking for a way to filter a column "Role" for counts of string and get an average of 2nd col "Rank" for these values. I tried value counts and string.contains but do not know how to bring this together.

import pandas as pd
data = {'Role':['Big, Big, Guard, Guard, Forward', 'Big, Big, Guard, Guard, Forward', 'Big, Guard, Big, Guard, Guard', 'Big, Big, Guard, Forward, Big', 'Guard, Big, Guard, Guard, Big','Big, Big, Guard, Forward, Big' ],
        'Rank':[10, 6, 5, 2, 1, 3]} 
df = pd.DataFrame(data)
print(df)

df

   Role                               Rank
0  Big, Big, Guard, Guard, Forward    10
1  Big, Big, Guard, Guard, Forward     6
2    Big, Guard, Big, Guard, Guard     5
3    Big, Big, Guard, Forward, Big     2
4    Guard, Big, Guard, Guard, Big     1
5    Big, Big, Guard, Forward, Big     3   

Idea of result filtering for "2* Big", .........

Role                   Value count    Rank/avg 
Big, Big               4               5.5 
Big, Big, Big          2               2.5

Just edited the values for two Bigs, original df is two big to add here. Output of 2Big and 3 Big is result wished for.

CodePudding user response:

Not exactly the same output format, but if what you are looking for is the Rank/avg, maybe that helps you:

data = {'Role':['Big, Big, Guard, Guard, Forward', 'Big, Big, Guard, Guard, Forward', 'Big, Guard, Big, Guard, Guard', 'Big, Big, Guard, Forward, Big', 'Guard, Big, Guard, Guard, Big','Big, Big, Guard, Forward, Big' ],
        'Rank':[10, 6, 5, 2, 1, 3]} 
df = pd.DataFrame(data)
df['Role'] = df['Role'].str.strip()
# Create another column to hold `Role` values as a list
df['Role2'] = df['Role'].str.split(', ')
# Count how many times each role appears in each `Role` value
role_counts = df['Role2'].apply(lambda x: pd.Series(x).value_counts())
# Merge with original dataframe
df2 = df.merge(role_counts, left_index=True, right_index=True)
# Calculate average by `Role` count
df2[role_counts.columns] = df2[role_counts.columns].fillna(0)
df2['Rank avg'] = 0
for col in role_counts:
    df2['Rank avg'] = df2.groupby(col)['Rank'].transform('mean')

print(df2)

Output:

- Role Rank Role2 Big Guard Forward Rank avg
0 Big, Big, Guard, Guard, Forward 10 [Big, Big, Guard, Guard, Forward] 2.0 2.0 1.0 5.25
1 Big, Big, Guard, Guard, Forward 6 [Big, Big, Guard, Guard, Forward] 2.0 2.0 1.0 5.25
2 Big, Guard, Big, Guard, Guard 5 [Big, Guard, Big, Guard, Guard] 2.0 3.0 0.0 3.00
3 Big, Big, Guard, Forward, Big 2 [Big, Big, Guard, Forward, Big] 3.0 1.0 1.0 5.25
4 Guard, Big, Guard, Guard, Big 1 [Guard, Big, Guard, Guard, Big] 2.0 3.0 0.0 3.00
5 Big, Big, Guard, Forward, Big 3 [Big, Big, Guard, Forward, Big] 3.0 1.0 1.0 5.25

CodePudding user response:

IIUC, you can first split and explode the strings, then perform a double groupby.agg:

(df
 .assign(Role=df['Role'].str.split(', '))
 .explode('Role')
 .reset_index()
 .groupby(['index', 'Role'], as_index=False)
 .agg({'Role': ','.join, 'Rank': 'first'})
 .groupby('Role')
 .agg(**{'Value count': ('Role', 'count'),
         'Rank_avg': ('Rank', lambda x: x.sum()/len(x))
        })
)

output:

                   Value count  Rank_avg
Role                                    
Big,Big                      4      5.50
Big,Big,Big                  2      2.50
Forward                      4      5.25
Guard                        2      2.50
Guard,Guard                  2      8.00
Guard,Guard,Guard            2      3.00
  • Related