GroupBy, Transpose, and flatten rows in Pandas-CodePudding

	as_of_date	industry	sector	deal	year	quarter	stage	amount	yield
0	2022-01-01	Mortgage	RMBS	XYZ	2022	NaN	A	111	0.1
1	2022-01-01	Mortgage	RMBS	XYZ	2022	1	A	222	0.2
2	2022-01-01	Mortgage	RMBS	XYZ	2022	2	A	333	0.3
3	2022-01-01	Mortgage	RMBS	XYZ	2022	3	A	444	0.4
4	2022-01-01	Mortgage	RMBS	XYZ	2022	4	A	555	0.5
5	2022-01-01	Mortgage	RMBS	XYZ	2022	Nan	B	123	0.6
6	2022-01-01	Mortgage	RMBS	XYZ	2022	1	B	234	0.7
7	2022-01-01	Mortgage	RMBS	XYZ	2022	2	B	345	0.8
8	2022-01-01	Mortgage	RMBS	XYZ	2022	3	B	456	0.9
9	2022-01-01	Mortgage	RMBS	XYZ	2022	4	B	567	1.0

For each group (as_of_date, industry, sector, deal, year, stage), I need to display all the amounts and yields in one line

I have tried this -

df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage'])['amount', 'yield' ].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()

but this is not working correctly.

Basically, I need this as output rows -

2022-01-01   Mortgage   RMBS   XYZ   2022   A   111   222   333   444   555   0.1   0.2   0.3   0.4   0.5

2022-01-01   Mortgage   RMBS   XYZ   2022   B   123   234   345   456   567   0.6   0.7   0.8   0.9   1.0

What would be the correct way to achieve this with Pandas? Thank you

CodePudding user response：

This can be calculated by creating a list for each column first, then combined this (using ), and turning this into a string, removing the [, ], ,:

df1 = df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).apply(
    lambda x: str(list(x['amount'])   list(x['yield']))[1:-1].replace(",", ""))

df1
#Out: 
#as_of_date  industry  sector  deal  year  stage
#2022-01-01  Mortgage  RMBS    XYZ   2022  A        111 222 333 444 555 0.1 0.2 0.3 0.4 0.5
#                                          B        123 234 345 456 567 0.6 0.7 0.8 0.9 1.0

CodePudding user response：

Maybe this?

df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).agg(' '.join).reset_index()

CodePudding user response：

does this answer your question?

df2 = df.pivot(index=['as_of_date','industry','sector','deal','year', 'stage'], columns=['quarter']).reset_index()

to flatten the columns names

df2.columns = df2.columns.to_series().str.join('_')
df2

   as_of_date_  industry_ sector_ deal_  year_ stage_  amount_1   amount_2   amount_3   amount_4   amount_NaN   amount_Nan   yield_1   yield_2   yield_3   yield_4   yield_NaN   yield_Nan 
0  2022-01-01   Mortgage    RMBS   XYZ    2022     A       222.0      333.0      444.0      555.0        111.0          NaN       0.2       0.3       0.4       0.5         0.1         NaN
1  2022-01-01   Mortgage    RMBS   XYZ    2022     B       234.0      345.0      456.0      567.0          NaN        123.0       0.7       0.8       0.9       1.0         NaN         0.6