Home > database >  change dataframe to another structure according id
change dataframe to another structure according id

Time:12-08

My initial pandas dataframe looks as follows:

df = pd.DataFrame(np.array([[999,888,1,0], [988,899,2,0], [981,821,3,0],[721,789,4,1],[723,745,5,1],[444,432,6,2],[423,412,7,2],[423,455,8,2],[478,432,9,2]]),
                   columns=['a', 'b', 'c', 'id'])

>>>df
a    b   c id
999  888 1 0
988  899 2 0
981  821 3 0
721  789 4 1
723  745 5 1
444  432 6 2
423  412 7 2
423  455 8 2
478  432 9 2 

Now I want to create a new dataframe with the following structure:

0    1    2    3   4  5
999  988  888  899 1  2
721  723  789  745 4  5
444  423  432  412 6  7

So in fact I just want to get the first two values of each id and each column (a,b,c). Example: first row -> col a: 999, 988 col b: 888, 899 col c: 1, 2

My current approach looks as follows but it seems quite slow and inconvenient:

import pandas as pd

list_to_append_a = []
list_to_append_b = []


for label_id in range(3):
    
    df_new = df.loc[df['id'] == label_id]    
    
    a = df_new['a'].head(2).values
    b = df_new['b'].head(2).values
    
    list_to_append_a.append(a)
    list_to_append_b.append(b)

    
    df_new_a = pd.DataFrame(list_to_append_a)
    df_new_b = pd.DataFrame(list_to_append_b) 

df_final= pd.concat([df_new_a, df_new_b], axis=1, ignore_index=True)

CodePudding user response:

This could be achived by the following:

pd.DataFrame(df.groupby("id").head(2).set_index('id').values.flatten().reshape((3,4)))

CodePudding user response:

Compute value count of id to get the size (number of unique id, here 3) and the smallest count (here 2 for id=1). Group by id and keep the first count row for each group using head.

Then, use melt to flat your dataframe and sort it by id. Finally, reshape your dataframe according the size variable and create a new dataframe:

size, count = df.value_counts('id').agg(['size', 'min'])
data = df.groupby('id').head(count).melt('id').sort_values('id')['value']
out = pd.DataFrame(data.values.reshape((size, -1)))
print(out)

# Output:
     0    1    2    3
0  999  988  888  899
1  721  723  789  745
2  444  423  432  412
  • Related