Home > Net >  Efficiently sort data into a DataFrame
Efficiently sort data into a DataFrame

Time:06-30

I have measurement data from different sources which I'd like to convert to a DataFrame. However, the values from the two sources are not of the same kind:

data_in = [
    [1.1, 'A', 1,2,3],
    [1.2, 'B', 10,20,30,40],
    [2.1, 'A', 1.1,2.1,3.1],
    [2.1, 'B', 11,21,31,41],
    [3.1, 'A', 1.2,2.2,3.2],
    [3.2, 'B', 12,22,32,42],
]
pd.DataFrame(data_in)

Rather, the resulting DataFrame should look like this:

data_out = [
    [1.1, 'A', 1,2,3],
    [1.2, 'B', np.NaN,np.NaN,np.NaN,10,20,30,40],
    [2.1, 'A', 1.1,2.1,3.1],
    [2.1, 'B', np.NaN,np.NaN,np.NaN,11,21,31,41],
    [3.1, 'A', 1.2,2.2,3.2],
    [3.2, 'B', np.NaN,np.NaN,np.NaN,12,22,32,42],
]
pd.DataFrame(data_out, columns=['timestamp', 'source', 'val1', 'val2', 'val2', 'par1', 'par2', 'par3', 'par4'])

Of course, I could loop over the data and manually sort each row into a dedicated DataFrame and then merge them, but I wonder if there is a more efficient or at least "nicer" way to do this using pandas.

Thanks.

CodePudding user response:

You can do

df1 = df.copy()
df.iloc[:,2:] = df.iloc[:,2:].mask(df[1].eq('B'))
df1.iloc[:,2:] = df1.iloc[:,2:].where(df[1].eq('B'))

out = df.merge(df1, on = [0,1]).dropna(axis = 1, thresh = 1)
Out[298]: 
     0  1  2_x  3_x  4_x   2_y   3_y   4_y   5_y
0  1.1  A  1.0  2.0  3.0   NaN   NaN   NaN   NaN
1  1.2  B  NaN  NaN  NaN  10.0  20.0  30.0  40.0
2  2.1  A  1.1  2.1  3.1   NaN   NaN   NaN   NaN
3  2.1  B  NaN  NaN  NaN  11.0  21.0  31.0  41.0
4  3.1  A  1.2  2.2  3.2   NaN   NaN   NaN   NaN
5  3.2  B  NaN  NaN  NaN  12.0  22.0  32.0  42.0
  • Related