Based on below Pandas DataFrame, there are time(secound) & heartrate values. Although 'time' should be consecutive numbers, we see some missing values. (1,2,4,5,7,9,10 etc) In that case, should I use pandas.DataFrame.interpolate to get desired result? or any other great idea to achieve it? Pls do note the original data are coming from API. Tried to look for any answers on the web, but no luck...
Original:
time heartrate
0 97
3 105
6 105
8 111
11 111
13 114
16 115
Desired output:
time heartrate
0 97
1 100
2 103
3 105
4 105
5 105
6 105
7 109
8 111
9 111
10 111
11 111
12 113
13 114
14 114
15 114
16 115
CodePudding user response:
Define time
as index then reindex with pd.RangeIndex
to get continuous values and interpolate heartrate
:
idx = pd.RangeIndex(df.time.min(), df.time.max() 1, name='time')
out = df.set_index('time').reindex(idx)['heartrate'] \
.interpolate(method='linear') \
.pipe(np.ceil) \
.reset_index()
Output:
>>> out
time heartrate
0 0 97.0
1 1 100.0
2 2 103.0
3 3 105.0
4 4 105.0
5 5 105.0
6 6 105.0
7 7 108.0
8 8 111.0
9 9 111.0
10 10 111.0
11 11 111.0
12 12 113.0
13 13 114.0
14 14 115.0
15 15 115.0
16 16 115.0
The result is imprecise. You have to adjust the interpolate method to get the desired result but the principle is the same.
CodePudding user response:
You can reindex
and interpolate
, and use np.ceil
to ceil the numbers:
(np.ceil(df.set_index('time')
.reindex(range(df['time'].max() 1))
.interpolate())
.reset_index()
)
output:
time heartrate
0 0 97.0
1 1 100.0
2 2 103.0
3 3 105.0
4 4 105.0
5 5 105.0
6 6 105.0
7 7 108.0
8 8 111.0
9 9 111.0
10 10 111.0
11 11 111.0
12 12 113.0
13 13 114.0
14 14 115.0
15 15 115.0
16 16 115.0