I want to interpolate (linear interpolation) data. but not indicated NA.
Here is my data.
timestamp | id | strength |
---|---|---|
1383260400000 | 10 | 0.007802251505435157 |
1383260400000 | 11 | 0.0050101566841440105 |
1383260400000 | 47 | 0.09910993935546883 |
1383260400000 | 48 | 0.16302926693340128 |
and expected data is :
timestamp | id | strength |
---|---|---|
1383260400000 | 10 | 0.007802251505435157 |
1383260400000 | 11 | 0.0050101566841440105 |
1383260400000 | 12 | interpolated strength |
1383260400000 | 13 | interpolated strength |
1383260400000 | 14 | interpolated strength |
1383260400000 | 15 | interpolated strength |
1383260400000 | 16 | interpolated strength |
1383260400000 | 17 | interpolated strength |
1383260400000 | ... | interpolated strength |
1383260400000 | 47 | 0.09910993935546883 |
1383260400000 | 48 | 0.16302926693340128 |
and there are data with another timestamp.
timestamp | id | strength |
---|---|---|
1383261000000 | 73 | 0.00034018953748529387 |
1383261000000 | 80 | 0.015745603609017354 |
1383261000000 | 81 | 0.01332206498346922 |
1383261000000 | 101 | 0.003292329017912283 |
1383261000000 | 102 | 0.0028735259562922954 |
1383261000000 | 103 | 0.0033962211735905955 |
and the last id is 2025 and timestamp is 13833462000000
CodePudding user response:
First convert values of id
to index, so possible use lambda function in GroupBy.apply
with Series.reindex
and Series.interpolate
:
f = lambda x: x.reindex(range(x.index.min(), x.index.max() 1)).interpolate()
df = df.set_index('id').groupby('timestamp')['strength'].apply(f).reset_index()
print (df)
timestamp id strength
0 1383260400000 10 0.007802
1 1383260400000 11 0.005010
2 1383260400000 12 0.007624
3 1383260400000 13 0.010238
4 1383260400000 14 0.012852
5 1383260400000 15 0.015466
6 1383260400000 16 0.018080
7 1383260400000 17 0.020693
8 1383260400000 18 0.023307
9 1383260400000 19 0.025921
10 1383260400000 20 0.028535
11 1383260400000 21 0.031149
12 1383260400000 22 0.033763
13 1383260400000 23 0.036377
14 1383260400000 24 0.038991
15 1383260400000 25 0.041605
16 1383260400000 26 0.044218
17 1383260400000 27 0.046832
18 1383260400000 28 0.049446
19 1383260400000 29 0.052060
20 1383260400000 30 0.054674
21 1383260400000 31 0.057288
22 1383260400000 32 0.059902
23 1383260400000 33 0.062516
24 1383260400000 34 0.065129
25 1383260400000 35 0.067743
26 1383260400000 36 0.070357
27 1383260400000 37 0.072971
28 1383260400000 38 0.075585
29 1383260400000 39 0.078199
30 1383260400000 40 0.080813
31 1383260400000 41 0.083427
32 1383260400000 42 0.086041
33 1383260400000 43 0.088654
34 1383260400000 44 0.091268
35 1383260400000 45 0.093882
36 1383260400000 46 0.096496
37 1383260400000 47 0.099110
38 1383260400000 48 0.163029
CodePudding user response:
You can try groupby timestamp
then reindex to make the id
column continuous and fill the NA
out = (df.groupby('timestamp')
.apply(lambda g: (g.set_index('id')
.reindex(range(g['id'].min(), g['id'].max() 1))
.reset_index()
.pipe(lambda df: df.fillna({'strength': 'interpolated strength', 'timestamp':g.name}))))
.reset_index(drop=True))
out['timestamp'] = out['timestamp'].astype(int)
print(out)
id timestamp strength
0 10 1383260400000 0.007802
1 11 1383260400000 0.00501
2 12 1383260400000 interpolated strength
3 13 1383260400000 interpolated strength
4 14 1383260400000 interpolated strength
5 15 1383260400000 interpolated strength
6 16 1383260400000 interpolated strength
7 17 1383260400000 interpolated strength
8 18 1383260400000 interpolated strength
9 19 1383260400000 interpolated strength
10 20 1383260400000 interpolated strength
11 21 1383260400000 interpolated strength
12 22 1383260400000 interpolated strength
13 23 1383260400000 interpolated strength
14 24 1383260400000 interpolated strength
15 25 1383260400000 interpolated strength
16 26 1383260400000 interpolated strength
17 27 1383260400000 interpolated strength
18 28 1383260400000 interpolated strength
19 29 1383260400000 interpolated strength
20 30 1383260400000 interpolated strength
21 31 1383260400000 interpolated strength
22 32 1383260400000 interpolated strength
23 33 1383260400000 interpolated strength
24 34 1383260400000 interpolated strength
25 35 1383260400000 interpolated strength
26 36 1383260400000 interpolated strength
27 37 1383260400000 interpolated strength
28 38 1383260400000 interpolated strength
29 39 1383260400000 interpolated strength
30 40 1383260400000 interpolated strength
31 41 1383260400000 interpolated strength
32 42 1383260400000 interpolated strength
33 43 1383260400000 interpolated strength
34 44 1383260400000 interpolated strength
35 45 1383260400000 interpolated strength
36 46 1383260400000 interpolated strength
37 47 1383260400000 0.09911
38 48 1383260400000 0.163029
CodePudding user response:
Use:
#preparing data
str1 = """timestamp id strength
1383260400000 10 0.007802251505435157
1383260400000 11 0.0050101566841440105
1383260400000 47 0.09910993935546883
1383260410000 48 0.16302926693340128
1383260410000 60 0.16302926693340128"""
data = [x.split() for x in str1.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])
df['id'] = df['id'].astype(int)
df['strength'] = df['strength'].astype(float)
df['timestamp'] = df['timestamp'].astype(float)
#solution
res = df.groupby('timestamp').agg({'id': lambda x: range(x.min(), x.max() 1)}).explode('id').reset_index().merge(df, on=['timestamp', 'id'], how='outer')
res['strength'] = res['strength'].interpolate()
res
Output:
timestamp id strength
0 1.383260e 12 10.0 0.007802
1 1.383260e 12 11.0 0.005010
2 1.383260e 12 12.0 0.009400
3 1.383260e 12 13.0 0.013789
4 1.383260e 12 14.0 0.018178
5 1.383260e 12 15.0 0.022568
6 1.383260e 12 16.0 0.026957
7 1.383260e 12 17.0 0.031347
8 1.383260e 12 18.0 0.035736
9 1.383260e 12 19.0 0.040126
10 1.383260e 12 20.0 0.044515
11 1.383260e 12 21.0 0.048904
12 1.383260e 12 22.0 0.053294
13 1.383260e 12 23.0 0.057683
14 1.383260e 12 24.0 0.062073
15 1.383260e 12 25.0 0.066462
16 1.383260e 12 26.0 0.070851
17 1.383260e 12 27.0 0.075241
18 1.383260e 12 28.0 0.079630
19 1.383260e 12 29.0 0.084020
20 1.383260e 12 30.0 0.088409
21 1.383260e 12 31.0 0.092799
22 1.383260e 12 32.0 0.097188
23 1.383260e 12 33.0 0.101577
24 1.383260e 12 34.0 0.105967
25 1.383260e 12 35.0 0.110356
26 1.383260e 12 36.0 0.114746
27 1.383260e 12 37.0 0.119135
28 1.383260e 12 38.0 0.123524
29 1.383260e 12 39.0 0.127914
30 1.383260e 12 40.0 0.132303
31 1.383260e 12 41.0 0.136693
32 1.383260e 12 42.0 0.141082
33 1.383260e 12 43.0 0.145472
34 1.383260e 12 44.0 0.149861
35 1.383260e 12 45.0 0.154250
36 1.383260e 12 46.0 0.158640
37 1.383260e 12 48.0 0.163029
38 1.383260e 12 49.0 0.157703
39 1.383260e 12 50.0 0.152376
40 1.383260e 12 51.0 0.147049
41 1.383260e 12 52.0 0.141723
42 1.383260e 12 53.0 0.136396
43 1.383260e 12 54.0 0.131070
44 1.383260e 12 55.0 0.125743
45 1.383260e 12 56.0 0.120416
46 1.383260e 12 57.0 0.115090
47 1.383260e 12 58.0 0.109763
48 1.383260e 12 59.0 0.104437
49 1.383260e 12 47.0 0.099110
50 1.383260e 12 60.0 0.163029