First of all, I have following a following dataframe df_A
sector | SALES | EBIT | DPS |
---|---|---|---|
IT | xxxx | yyyy | zzz |
ENERGY | xxxx | yyyy | zzz |
FINANCE | xxxx | yyyy | zzz |
CONSUMER | xxxx | yyyy | zzz |
and another dataframe df_B
NAME | sector | SALES | EBIT | DPS |
---|---|---|---|---|
AAPL | IT | xxxx | yyyy | zzz |
BP | ENERGY | xxxx | yyyy | zzz |
TGT | CONSUMER | xxxx | yyyy | zzz |
MSFT | IT | xxxx | yyyy | zzz |
HSBC | FINANCE | xxxx | yyyy | zzz |
GOOG | IT | xxxx | yyyy | zzz |
WMT | CONSUMER | xxxx | yyyy | zzz |
META | IT | xxxx | yyyy | zzz |
CVX | ENERGY | xxxx | yyyy | zzz |
JPM | FINANCE | xxxx | yyyy | zzz |
MCD | CONSUMER | xxxx | yyyy | zzz |
and so on
this is just an example, and I have a way bigger dataframe than this
what I want to do is to create new dataframes by distinguishing df_B by it's sectors;
where the newly created dataframes follow the order of df_A["sectors"]
and in the end merge them altogether, hopefully in horizontal format
so in the end I want my output to look like
NAME | sector | SALES | EBIT | DPS | NAME | sector | SALES | EBIT | DPS | NAME | sector | SALES | EBIT | DPS | NAME | sector | SALES | EBIT | DPS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AAPL | IT | xxxx | yyyy | zzz | BP | ENERGY | xxxx | yyyy | zzz | HSBC | FINANCE | xxxx | yyyy | zzz | WMT | CONSUMER | xxxx | yyyy | zzz |
MSFT | IT | xxxx | yyyy | zzz | CVX | ENERGY | xxxx | yyyy | zzz | JPM | FINANCE | xxxx | yyyy | zzz | TGT | CONSUMER | xxxx | yyyy | zzz |
GOOG | IT | xxxx | yyyy | zzz | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | MCD | CONSUMER | xxxx | yyyy | zzz |
META | IT | xxxx | yyyy | zzz |
if the horizontal format above doesn't work, vertical table will also be okay
I'm noob in python and I tried using for loops, dictionary, loc/iloc but somehow none of my codes is working properly...
Any help is deeply appreciated
CodePudding user response:
Create N dataframes, one for each sector, then concatenate them into a single one:
out = pd.concat([pd.DataFrame(df_B[df_B['sector'] == sector].to_dict('records'))
for sector in df_A['sector'].unique().tolist()], axis=1)
print(out)
# Output
NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS
0 AAPL IT xxxx yyyy zzz BP ENERGY xxxx yyyy zzz HSBC FINANCE xxxx yyyy zzz TGT CONSUMER xxxx yyyy zzz
1 MSFT IT xxxx yyyy zzz CVX ENERGY xxxx yyyy zzz JPM FINANCE xxxx yyyy zzz WMT CONSUMER xxxx yyyy zzz
2 GOOG IT xxxx yyyy zzz NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN MCD CONSUMER xxxx yyyy zzz
3 META IT xxxx yyyy zzz NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN