Home > Net >  GroupBy, Transpose, and flatten rows in Pandas
GroupBy, Transpose, and flatten rows in Pandas

Time:05-27

as_of_date industry sector deal year quarter stage amount yield
0 2022-01-01 Mortgage RMBS XYZ 2022 NaN A 111 0.1
1 2022-01-01 Mortgage RMBS XYZ 2022 1 A 222 0.2
2 2022-01-01 Mortgage RMBS XYZ 2022 2 A 333 0.3
3 2022-01-01 Mortgage RMBS XYZ 2022 3 A 444 0.4
4 2022-01-01 Mortgage RMBS XYZ 2022 4 A 555 0.5
5 2022-01-01 Mortgage RMBS XYZ 2022 Nan B 123 0.6
6 2022-01-01 Mortgage RMBS XYZ 2022 1 B 234 0.7
7 2022-01-01 Mortgage RMBS XYZ 2022 2 B 345 0.8
8 2022-01-01 Mortgage RMBS XYZ 2022 3 B 456 0.9
9 2022-01-01 Mortgage RMBS XYZ 2022 4 B 567 1.0

For each group (as_of_date, industry, sector, deal, year, stage), I need to display all the amounts and yields in one line

I have tried this -

df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage'])['amount', 'yield' ].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()

but this is not working correctly.

Basically, I need this as output rows -

2022-01-01   Mortgage   RMBS   XYZ   2022   A   111   222   333   444   555   0.1   0.2   0.3   0.4   0.5

2022-01-01   Mortgage   RMBS   XYZ   2022   B   123   234   345   456   567   0.6   0.7   0.8   0.9   1.0

What would be the correct way to achieve this with Pandas? Thank you

CodePudding user response:

This can be calculated by creating a list for each column first, then combined this (using ), and turning this into a string, removing the [, ], ,:

df1 = df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).apply(
    lambda x: str(list(x['amount'])   list(x['yield']))[1:-1].replace(",", ""))

df1
#Out: 
#as_of_date  industry  sector  deal  year  stage
#2022-01-01  Mortgage  RMBS    XYZ   2022  A        111 222 333 444 555 0.1 0.2 0.3 0.4 0.5
#                                          B        123 234 345 456 567 0.6 0.7 0.8 0.9 1.0

CodePudding user response:

Maybe this?

df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).agg(' '.join).reset_index()

CodePudding user response:

does this answer your question?

df2 = df.pivot(index=['as_of_date','industry','sector','deal','year', 'stage'], columns=['quarter']).reset_index() 

to flatten the columns names

df2.columns = df2.columns.to_series().str.join('_')
df2
   as_of_date_  industry_ sector_ deal_  year_ stage_  amount_1   amount_2   amount_3   amount_4   amount_NaN   amount_Nan   yield_1   yield_2   yield_3   yield_4   yield_NaN   yield_Nan 
0  2022-01-01   Mortgage    RMBS   XYZ    2022     A       222.0      333.0      444.0      555.0        111.0          NaN       0.2       0.3       0.4       0.5         0.1         NaN
1  2022-01-01   Mortgage    RMBS   XYZ    2022     B       234.0      345.0      456.0      567.0          NaN        123.0       0.7       0.8       0.9       1.0         NaN         0.6
  • Related