I have a dataframe that looks like that :
data = {'Place A':['New York', 'Oslo', 'Oslo'], 'Place B':['Paris','London','London'], 'Type':['A','B','A'], 'Number' : [34,42,24]}
df = pd.DataFrame(data)
print(df)
Place A | Place B | Type | Number |
---|---|---|---|
New York | Paris | A | 34 |
Oslo | London | B | 42 |
Oslo | London | A | 24 |
i need to have the percentage number of each type according to the routes.
I don't know witch command to use to get a dataframe that looks like this
xxx | Paris | Oslo | London | New York |
---|---|---|---|---|
Paris | NaN | NaN | NaN | type A : 100% |
Oslo | NaN | NaN | type A : 36,3% type B : 63,6% | NaN |
London | NaN | type A : 36,3% type B : 63,6% | NaN | NaN |
New York | type A : 100% | NaN | NaN | NaN |
CodePudding user response:
IUUC, you could reshape the data, then compute the percent per group, finally reshape again:
cities = set([*df['Place A'], *df['Place B']])
df['total'] = df.groupby(['Place A', 'Place B'])['Number'].transform('sum')
df['percent'] = df['Number']/df['total']
df['value'] = df['Type'] ': ' df['percent'].round(2).astype(str)
df2 = (df.groupby(['Place A', 'Place B'])['value']
.agg(', '.join).unstack(1)
.rename_axis(index=None, columns=None)
.reindex(index=cities, columns=cities)
)
# make output symmetrical
df2.update(df2.T)
output:
Oslo New York Paris London
Oslo NaN NaN NaN B: 0.64, A: 0.36
New York NaN NaN A: 1.0 NaN
Paris NaN A: 1.0 NaN NaN
London B: 0.64, A: 0.36 NaN NaN NaN
If you don't need the symmetrical output, you can keep it by PlaceA/PlaceB, which will have the same information in condensed form:
df2 = df.groupby(['Place A', 'Place B'])['value'].agg(', '.join).unstack(1)
output:
Place B London Paris
Place A
New York NaN A: 1.0
Oslo B: 0.64, A: 0.36 NaN