I have the data as below it will contain the categories and data with multiple row,
key date cat1 cat2 id1 data1 data2 data3 data4 id2 data5 data5 data6 data7 id3 data8 data9 data10 id4 data11 data12 data13
x 0 7 foo 54 4065 -41 -78 102 0 126 16 119 60522 na na na na na na na na
x 0 7 foo 53 4200 -42 -87 102 0 130 12 119 na na na na na na na na na
x 0 7 foo 60 4203 -46 -114 102 0 130 12 118 na na na na na na na na na
x 0 7 foo 61 na na na na 1 na na na na 1 15000 57 5481 na na na na
x 0 7 foo 54 na na na na 1 na na na na 4 14196 57 5001 0 1 0 8558
x 0 8 foo 61 na na na na 1 na na na na 0 15000 57 5361 na na na na
x 0 8 foo 59 na na na na 0 na na na na 1 15000 57 6041 na na na na
x 0 8 foo 54 na na na na 0 na na na na 3 14196 57 5841 1 1 0 7565
y 0 7 foo 61 3500 -35 -100 na 0 na na na na 2 15000 57 5401 na na na na
y 0 7 foo 59 na na na na 1 na na na na 0 15000 57 5441 na na na na
y 0 7 foo 59 na na na na 1 na na na na 2 15000 57 5601 na na na na
y 0 8 foo 61 na na na na 1 na na na na 4 15000 57 5401 na na na na
y 0 8 foo 54 na na na na 0 na na na na 0 14196 57 5081 2 1 0 9507
y 0 8 foo 54 na na na na 0 na na na na 1 14196 57 5721 3 1 0 9547
y 0 7 foo 59 na na na na 0 na na na na 4 15000 57 5641 na na na na
I want the new data table will be as below
key date cat1 data1_id1-54_id2-0 data2_id1-54-id2-0 … data8_id1-54_id2-1_id3-1_id4-0 data8_id1-61_id2-0_id3-2
x 0 7 4065 -41 14196 na
x 0 8
y 0 7 3000 -35 na 15000
y 0 8
Question : How to convert multiple row to single row of data by column name come from its row value, currently, I'm doing by iterating row by row and create new dataframe by appending the column but It's too slow and look like a dumb code.
Alternatively, This data was collected as an array dimension by row.
example
key date id0 id1 data1 data2
x 0 0 0 10 20
x 0 0 1 20 30
x 0 1 0 40 50
x 0 1 1 60 80
I was query from database and concatenate in dataframe if you have any other suggestion like how to query this data to be single row it would be good also,
Thanks for advice.
CodePudding user response:
You can try pd.pivot()
df_ = df.pivot(index=['key', 'date', 'cat1', 'cat2'], columns=['id1', 'id2', 'id3', 'id4'])
df_.columns = df_.columns.map(lambda x: f'{x[0]}_' '_'.join([f'id{i}-{v}'
for i, v in enumerate(list(x)[1:])
if v != 'na']))
df_ = df_.reset_index()
print(df_)
key date cat1 cat2 ... data13_id0-61_id1-1_id2-4 data13_id0-54_id1-0_id2-0_id3-2 data13_id0-54_id1-0_id2-1_id3-3 data13_id0-59_id1-0_id2-4
0 x 0 7 foo ... NaN NaN NaN NaN
1 x 0 8 foo ... NaN NaN NaN NaN
2 y 0 7 foo ... NaN NaN NaN na
3 y 0 8 foo ... na 9507 9547 NaN
[4 rows x 214 columns]