Home > database >  pandas displaying one column's value in the order of top score
pandas displaying one column's value in the order of top score

Time:09-14

I have a data like

deviceType category date score
A soccer 07012022 10
A baseball 07012022 8
B basketball 07012022 9
A soccer 07022022 9

and I would like to convert it to

category A B C...(device types so on)
soccer basketball
baseball

which displays category values under each device type in descending order of average of that category scores (soccer is above baseball for device type A as its average score, 9.5, is higher than average score of baseball, 8)

I was trying this with using groupby with two columns (deviceType and category), but I couldn't figure out how to put one column's value under another.

I will appreciate any help.

CodePudding user response:

You can gropby('device') later work with every group separatelly to create Series with devide as name, and concatenate with other columns.

import pandas as pd

data = {
    'device': ['A', 'A','B','A','C'], 
    'category': ['soccer','basketball','basketball','soccer','tenis'],
    'score': [10,8,9,9,0],
}

df = pd.DataFrame(data)
print(df)

new = pd.DataFrame()

for key, val in df.groupby('device'):
    data = val.groupby('category').mean().sort_values(by='score', ascending=False)
    print(data)
    s = pd.Series(data.index)
    s.name = key
    new = pd.concat([new, s], axis=1)
    
print(new)   

Result:

  device    category  score
0      A      soccer     10
1      A  basketball      8
2      B  basketball      9
3      A      soccer      9
4      C       tenis      0

            score
category         
soccer        9.5
basketball    8.0

            score
category         
basketball    9.0

          score
category       
tenis       0.0

            A           B      C
0      soccer  basketball  tenis
1  basketball         NaN    NaN

CodePudding user response:

# Calculate the mean score per category and device type
tmp = df.pivot_table(
    index="category",
    columns="deviceType",
    values="score",
    aggfunc="mean",
)

# Extract the score and category matrix into numpy
score = tmp.to_numpy()
category = np.tile(tmp.index, (len(tmp.columns), 1)).T

# Sort the score descending. numpy always sort ascending so to do descending
# sort, we need to negate the `score` array
index = np.argsort(-score, axis=0)

# Sort score and category
ordered_score = np.take_along_axis(score, index, axis=0)
ordered_category = np.take_along_axis(category, index, axis=0)

# Result
pd.DataFrame(
    np.where(np.isnan(ordered_score), "", ordered_category), columns=tmp.columns
)
  • Related