How can a create a percentage matrix based on a dataframe-CodePudding

I have a dataframe that looks like that :

data = {'Place A':['New York', 'Oslo', 'Oslo'], 'Place B':['Paris','London','London'], 'Type':['A','B','A'], 'Number' : [34,42,24]}
df = pd.DataFrame(data)
print(df)

Place A	Place B	Type	Number
New York	Paris	A	34
Oslo	London	B	42
Oslo	London	A	24

i need to have the percentage number of each type according to the routes.

I don't know witch command to use to get a dataframe that looks like this

xxx	Paris	Oslo	London	New York
Paris	NaN	NaN	NaN	type A : 100%
Oslo	NaN	NaN	type A : 36,3% type B : 63,6%	NaN
London	NaN	type A : 36,3% type B : 63,6%	NaN	NaN
New York	type A : 100%	NaN	NaN	NaN

CodePudding user response：

IUUC, you could reshape the data, then compute the percent per group, finally reshape again:

cities = set([*df['Place A'], *df['Place B']])

df['total'] = df.groupby(['Place A', 'Place B'])['Number'].transform('sum')
df['percent'] = df['Number']/df['total']
df['value'] = df['Type'] ': ' df['percent'].round(2).astype(str)

df2 = (df.groupby(['Place A', 'Place B'])['value']
         .agg(', '.join).unstack(1)
         .rename_axis(index=None, columns=None)
         .reindex(index=cities, columns=cities)
      )

# make output symmetrical
df2.update(df2.T)

output:

                      Oslo New York   Paris            London
Oslo                   NaN      NaN     NaN  B: 0.64, A: 0.36
New York               NaN      NaN  A: 1.0               NaN
Paris                  NaN   A: 1.0     NaN               NaN
London    B: 0.64, A: 0.36      NaN     NaN               NaN

If you don't need the symmetrical output, you can keep it by PlaceA/PlaceB, which will have the same information in condensed form:

df2 = df.groupby(['Place A', 'Place B'])['value'].agg(', '.join).unstack(1)

output:

Place B             London   Paris
Place A                           
New York               NaN  A: 1.0
Oslo      B: 0.64, A: 0.36     NaN