I have a data frame that looks like the following.
0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
0: 2 | 57: 9 | None | 436: 77 | 11469: 1018 | 203: 44 |
0: 0 | 57: 15 | None | 436: 47 | None | 203: 89 |
0: 45 | 57: 0 | 11469: 1116 | 436: 7 | None | 203: 0 |
0: 1 | 57: 23 | None | 436: 0 | 11469: 18 | None |
0: 23 | 57: 5 | None | 436: 63 | None | 203: 4 |
Here, the column values represent the distance and time, in meters and seconds (57: 9
means 57
meters and 9
seconds). I want to rename my column such that the meter value becomes column name and the seconds value remains as a column value. Moreover, the columns where values are None, they should be replaced by Zero (0).
Desired output:
0 | 57 | 11469 | 436 | 11469 | 203 |
---|---|---|---|---|---|
2 | 9 | 0 | 77 | 1018 | 44 |
0 | 15 | 0 | 47 | 0 | 89 |
45 | 0 | 1116 | 7 | 0 | 0 |
1 | 23 | 0 | 0 | 18 | 0 |
23 | 5 | 0 | 63 | 0 | 4 |
I am new to python so I don't know how I can achieve that.
CodePudding user response:
First split each column by :
with select last splitted values and replace to missing values, for columns forward filling missing values with select last row and after split select first values:
df1 = df.apply(lambda x: x.str.split(': ').str[-1]).fillna(0)
df1.columns = df.ffill().iloc[-1].str.split(': ').str[0].tolist()
print (df1)
0 57 11469 436 11469 203
0 2 9 0 77 1018 44
1 0 15 0 47 0 89
2 45 0 1116 7 0 0
3 1 23 0 0 18 0
4 23 5 0 63 0 4