Home > Mobile >  Based on some rules, how to expand data in Pandas?
Based on some rules, how to expand data in Pandas?

Time:12-10

Please forgive my English. I hope I can say clearly.

Assume we have this data:

>>> data = {'Span':[3,3.5], 'Low':[6.2,5.16], 'Medium':[4.93,4.1], 'High':[3.68,3.07], 'VeryHigh':[2.94,2.45], 'ExtraHigh':[2.48,2.06], '0.9':[4.9,3.61], '1.5':[3.23,2.38], '2':[2.51,1.85]}
>>> df = pd.DataFrame(data)
>>> df
   Span   Low  Medium  High  VeryHigh  ExtraHigh   0.9   1.5     2
0   3.0  6.20    4.93  3.68      2.94       2.48  4.90  3.23  2.51
1   3.5  5.16    4.10  3.07      2.45       2.06  3.61  2.38  1.85

I want to get this data:

    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85

The principles apply to df:

  • Span expands by the combination of Wind and Snow to get the MaxSpacing
  • Wind and Snow is mutually exclusive. When Wind is one of 'Low', 'Medium', 'High', 'VeryHigh', 'ExtraHigh', Snow is zero; when Snow is one of 0.9, 1.5, 2, Wind is zero.

Please help. Thank you.

CodePudding user response:

Use DataFrame.melt for unpivot and then sorting by indices, create Snow column by to_numeric and Series.fillna in DataFrame.insert and last set 0 for Wind column:

df = (df.melt('Span', ignore_index=False, var_name='Wind', value_name='MaxSpacing')
        .sort_index(ignore_index=True))

s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85

Alternative solution with DataFrame.set_index and DataFrame.stack:

df = df.set_index('Span').rename_axis('Wind', axis=1).stack().reset_index(name='MaxSpacing')

s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85
  • Related