as_of_date | industry | sector | deal | year | quarter | stage | amount | yield | |
---|---|---|---|---|---|---|---|---|---|
0 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | NaN | A | 111 | 0.1 |
1 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 1 | A | 222 | 0.2 |
2 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 2 | A | 333 | 0.3 |
3 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 3 | A | 444 | 0.4 |
4 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 4 | A | 555 | 0.5 |
5 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | Nan | B | 123 | 0.6 |
6 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 1 | B | 234 | 0.7 |
7 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 2 | B | 345 | 0.8 |
8 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 3 | B | 456 | 0.9 |
9 | 2022-01-01 | Mortgage | RMBS | XYZ | 2022 | 4 | B | 567 | 1.0 |
For each group (as_of_date, industry, sector, deal, year, stage), I need to display all the amounts and yields in one line
I have tried this -
df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage'])['amount', 'yield' ].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()
but this is not working correctly.
Basically, I need this as output rows -
2022-01-01 Mortgage RMBS XYZ 2022 A 111 222 333 444 555 0.1 0.2 0.3 0.4 0.5
2022-01-01 Mortgage RMBS XYZ 2022 B 123 234 345 456 567 0.6 0.7 0.8 0.9 1.0
What would be the correct way to achieve this with Pandas? Thank you
CodePudding user response:
This can be calculated by creating a list for each column first, then combined this (using
), and turning this into a string, removing the [
, ]
, ,
:
df1 = df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).apply(
lambda x: str(list(x['amount']) list(x['yield']))[1:-1].replace(",", ""))
df1
#Out:
#as_of_date industry sector deal year stage
#2022-01-01 Mortgage RMBS XYZ 2022 A 111 222 333 444 555 0.1 0.2 0.3 0.4 0.5
# B 123 234 345 456 567 0.6 0.7 0.8 0.9 1.0
CodePudding user response:
Maybe this?
df.groupby(['as_of_date', 'industry', 'sector', 'deal', 'year', 'stage']).agg(' '.join).reset_index()
CodePudding user response:
does this answer your question?
df2 = df.pivot(index=['as_of_date','industry','sector','deal','year', 'stage'], columns=['quarter']).reset_index()
to flatten the columns names
df2.columns = df2.columns.to_series().str.join('_')
df2
as_of_date_ industry_ sector_ deal_ year_ stage_ amount_1 amount_2 amount_3 amount_4 amount_NaN amount_Nan yield_1 yield_2 yield_3 yield_4 yield_NaN yield_Nan
0 2022-01-01 Mortgage RMBS XYZ 2022 A 222.0 333.0 444.0 555.0 111.0 NaN 0.2 0.3 0.4 0.5 0.1 NaN
1 2022-01-01 Mortgage RMBS XYZ 2022 B 234.0 345.0 456.0 567.0 NaN 123.0 0.7 0.8 0.9 1.0 NaN 0.6