I have a data like
deviceType | category | date | score |
---|---|---|---|
A | soccer | 07012022 | 10 |
A | baseball | 07012022 | 8 |
B | basketball | 07012022 | 9 |
A | soccer | 07022022 | 9 |
and I would like to convert it to
category | A | B | C...(device types so on) |
---|---|---|---|
soccer | basketball | ||
baseball |
which displays category values under each device type in descending order of average of that category scores (soccer is above baseball for device type A as its average score, 9.5, is higher than average score of baseball, 8)
I was trying this with using groupby with two columns (deviceType and category), but I couldn't figure out how to put one column's value under another.
I will appreciate any help.
CodePudding user response:
You can gropby('device')
later work with every group separatelly to create Series
with devide as name
, and concatenate with other columns.
import pandas as pd
data = {
'device': ['A', 'A','B','A','C'],
'category': ['soccer','basketball','basketball','soccer','tenis'],
'score': [10,8,9,9,0],
}
df = pd.DataFrame(data)
print(df)
new = pd.DataFrame()
for key, val in df.groupby('device'):
data = val.groupby('category').mean().sort_values(by='score', ascending=False)
print(data)
s = pd.Series(data.index)
s.name = key
new = pd.concat([new, s], axis=1)
print(new)
Result:
device category score
0 A soccer 10
1 A basketball 8
2 B basketball 9
3 A soccer 9
4 C tenis 0
score
category
soccer 9.5
basketball 8.0
score
category
basketball 9.0
score
category
tenis 0.0
A B C
0 soccer basketball tenis
1 basketball NaN NaN
CodePudding user response:
# Calculate the mean score per category and device type
tmp = df.pivot_table(
index="category",
columns="deviceType",
values="score",
aggfunc="mean",
)
# Extract the score and category matrix into numpy
score = tmp.to_numpy()
category = np.tile(tmp.index, (len(tmp.columns), 1)).T
# Sort the score descending. numpy always sort ascending so to do descending
# sort, we need to negate the `score` array
index = np.argsort(-score, axis=0)
# Sort score and category
ordered_score = np.take_along_axis(score, index, axis=0)
ordered_category = np.take_along_axis(category, index, axis=0)
# Result
pd.DataFrame(
np.where(np.isnan(ordered_score), "", ordered_category), columns=tmp.columns
)