Based on some rules, how to expand data in Pandas?-CodePudding

Please forgive my English. I hope I can say clearly.

Assume we have this data:

>>> data = {'Span':[3,3.5], 'Low':[6.2,5.16], 'Medium':[4.93,4.1], 'High':[3.68,3.07], 'VeryHigh':[2.94,2.45], 'ExtraHigh':[2.48,2.06], '0.9':[4.9,3.61], '1.5':[3.23,2.38], '2':[2.51,1.85]}
>>> df = pd.DataFrame(data)
>>> df
   Span   Low  Medium  High  VeryHigh  ExtraHigh   0.9   1.5     2
0   3.0  6.20    4.93  3.68      2.94       2.48  4.90  3.23  2.51
1   3.5  5.16    4.10  3.07      2.45       2.06  3.61  2.38  1.85

I want to get this data:

    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85

The principles apply to df:

Span expands by the combination of Wind and Snow to get the MaxSpacing
Wind and Snow is mutually exclusive. When Wind is one of 'Low', 'Medium', 'High', 'VeryHigh', 'ExtraHigh', Snow is zero; when Snow is one of 0.9, 1.5, 2, Wind is zero.

Please help. Thank you.

CodePudding user response：

Use DataFrame.melt for unpivot and then sorting by indices, create Snow column by to_numeric and Series.fillna in DataFrame.insert and last set 0 for Wind column:

df = (df.melt('Span', ignore_index=False, var_name='Wind', value_name='MaxSpacing')
        .sort_index(ignore_index=True))

s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85

Alternative solution with DataFrame.set_index and DataFrame.stack:

df = df.set_index('Span').rename_axis('Wind', axis=1).stack().reset_index(name='MaxSpacing')

s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85