Home > Blockchain >  How to iterate each row according to values of row above?
How to iterate each row according to values of row above?

Time:05-12

Suppose sensors are attached to 3 climbers scaling a structure and these sensors capture a certain measurement at random times. The data are captured into the data frame below (the data frame is a lot longer than this):

df = pd.DataFrame({
'Name': ['Cody', 'Dustin', 'Dustin', 'Cody', 'Ryan', 'Dustin', 'Ryan', 'Cody'],
'Timestamp': ['08:10:23', '08:12:58', '08:15:02', '08:19:43', '08:21:00', '08:30:17', '08:34:01', '08:34:59'],
'Category': ['Body Temp', 'Altitude', 'Heart Rate', 'Body Temp', 'Heart Rate', 'Heart Rate', 'Altitude', 'Altitude'],
'Body Temp': [35.9, np.nan, np.nan, 36.2, np.nan, np.nan, np.nan, np.nan],
'Altitude': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 12, 6],
'Heart Rate': [np.nan, np.nan, 75, np.nan, 71, 69, np.nan, np.nan]
})

     Name Timestamp    Category  Body Temp  Altitude  Heart Rate
0    Cody  08:10:23   Body Temp       35.9       NaN         NaN
1  Dustin  08:12:58    Altitude        NaN       7.0         NaN
2  Dustin  08:15:02  Heart Rate        NaN       NaN        75.0
3    Cody  08:19:43   Body Temp       36.2       NaN         NaN
4    Ryan  08:21:00  Heart Rate        NaN       NaN        71.0
5  Dustin  08:30:17  Heart Rate        NaN       NaN        69.0
6    Ryan  08:34:01    Altitude        NaN      12.0         NaN
7    Cody  08:34:59    Altitude        NaN       6.0         NaN

The intention is to constantly update the measurements at every row, according to each climber and timestamp, such that every succeeding row for each climber will have their measurements updated.

So a result should look something like this:

     Name Timestamp    Category  Body Temp  Altitude  Heart Rate
0    Cody  08:10:23   Body Temp       35.9       NaN         NaN
1  Dustin  08:12:58    Altitude        NaN       7.0         NaN
2  Dustin  08:15:02  Heart Rate        NaN       7.0        75.0
3    Cody  08:19:43   Body Temp       36.2       NaN         NaN
4    Ryan  08:21:00  Heart Rate        NaN       NaN        71.0
5  Dustin  08:30:17  Heart Rate        NaN       7.0        69.0
6    Ryan  08:34:01    Altitude        NaN      12.0        71.0
7    Cody  08:34:59    Altitude       36.2       6.0         NaN

So far I've thought of using .sort_value() to separate the climbers and work from there. But I have trouble figuring out how to update each row continually. Are functions or iterrows required for this?

CodePudding user response:

The job essentially seems to be filling missing values by a previous value if such value exists at that measurement for each climber, so groupby.ffill should do the job:

out = df[['Name']].join(df.groupby('Name').ffill())

Output:

     Name Timestamp    Category  Body Temp  Altitude  Heart Rate
0    Cody  08:10:23   Body Temp       35.9       NaN         NaN
1  Dustin  08:12:58    Altitude        NaN       7.0         NaN
2  Dustin  08:15:02  Heart Rate        NaN       7.0        75.0
3    Cody  08:19:43   Body Temp       36.2       NaN         NaN
4    Ryan  08:21:00  Heart Rate        NaN       NaN        71.0
5  Dustin  08:30:17  Heart Rate        NaN       7.0        69.0
6    Ryan  08:34:01    Altitude        NaN      12.0        71.0
7    Cody  08:34:59    Altitude       36.2       6.0         NaN
  • Related