Home > Blockchain >  Flatten grid-like Dataframe in Pandas
Flatten grid-like Dataframe in Pandas

Time:11-15

I have a raw dataframe shown like below, it's not a normalized table form due to the Min, Max and Unit row.

Name      FIELD_A    Field_B    FIELD_C    FIELD_D
Min          NONE        100       -1.0          0         
Max          NONE        500       0.75        500
Unit         NONE         ms          %         cm
1      2020/01/21        150        0.8        200
2      2020/01/22        160        0.7        210
3      2020/01/23        170        0.6        220
4      2020/01/24        180        0.5        230
5      2020/01/25        190        0.4        240

The dataframe could be create with following code:

df_raw = pd.DataFrame({
    'Name':['FIELD_A', 'FIELD_B', 'FIELD_C', 'FIELD_D'], 
    'Min':[None, 100, -1.0, 0],
    'Max':[None, 500, 0.75, 500],
    'Unit':[None, 'ms', '%', 'cm'],
    '1':['2020/01/21',150,0.8,200],
    '2':['2020/01/22',160,0.7,210],
    '3':['2020/01/23',170,0.6,220],
    '4':['2020/01/24',180,0.5,230],
    '5':['2020/01/25',190,0.4,240]
    }).transpose()

Then I'm going to flatten this grid-like dataframe into a normalized table form shown like below:

No.    FIELD         VALUE    MIN     MAX    UNIT
1          A    2020/01/21   None    NONE    NONE
2          A    2020/01/22   None    NONE    NONE
3          A    2020/01/23   None    NONE    NONE
4          A    2020/01/24   None    NONE    NONE
5          A    2020/01/25   None    NONE    NONE
1          B           150    100     500      ms
2          B           160    100     500      ms
3          B           170    100     500      ms
4          B           180    100     500      ms
5          B           190    100     500      ms
1          C           0.8   -1.0    0.75       %
2          C           0.7   -1.0    0.75       %
3          C           0.6   -1.0    0.75       %
4          C           0.5   -1.0    0.75       %
5          C           0.4   -1.0    0.75       %
1          D           200      0     500      cm
2          D           210      0     500      cm
3          D           220      0     500      cm
4          D           230      0     500      cm
5          D           240      0     500      cm

Currently, I done this with for idx, value in enumerate(raw_df[FIELD]) inside for FIELD in raw_df.columns[1:]. But it seems the operation is not effective exactly...

CodePudding user response:

You can transpose, set_index on the first lines to create a MultiIndex, then melt to reshape. The rest is just reworking a bit the output,

(df_raw.T
 .rename(columns={'Name': 'Field'})
 .set_index(['Field','Min','Max','Unit'])
 .T
 .melt()
 .assign(Field=lambda d: d['Field'].str.split('_').str[-1])
)

Output:

   Field    Min     Max Unit       value
0      A    NaN     NaN  NaN  2020/01/21
1      A    NaN     NaN  NaN  2020/01/22
2      A    NaN     NaN  NaN  2020/01/23
3      A    NaN     NaN  NaN  2020/01/24
4      A    NaN     NaN  NaN  2020/01/25
5      B  100.0  500.00   ms         150
6      B  100.0  500.00   ms         160
7      B  100.0  500.00   ms         170
8      B  100.0  500.00   ms         180
9      B  100.0  500.00   ms         190
10     C   -1.0    0.75    %         0.8
11     C   -1.0    0.75    %         0.7
12     C   -1.0    0.75    %         0.6
13     C   -1.0    0.75    %         0.5
14     C   -1.0    0.75    %         0.4
15     D    0.0  500.00   cm         200
16     D    0.0  500.00   cm         210
17     D    0.0  500.00   cm         220
18     D    0.0  500.00   cm         230
19     D    0.0  500.00   cm         240
  • Related