Home > front end >  interpolation of missing values not NA
interpolation of missing values not NA

Time:05-10

I want to interpolate (linear interpolation) data. but not indicated NA.

Here is my data.

timestamp id strength
1383260400000 10 0.007802251505435157
1383260400000 11 0.0050101566841440105
1383260400000 47 0.09910993935546883
1383260400000 48 0.16302926693340128

and expected data is :

timestamp id strength
1383260400000 10 0.007802251505435157
1383260400000 11 0.0050101566841440105
1383260400000 12 interpolated strength
1383260400000 13 interpolated strength
1383260400000 14 interpolated strength
1383260400000 15 interpolated strength
1383260400000 16 interpolated strength
1383260400000 17 interpolated strength
1383260400000 ... interpolated strength
1383260400000 47 0.09910993935546883
1383260400000 48 0.16302926693340128

and there are data with another timestamp.

timestamp id strength
1383261000000 73 0.00034018953748529387
1383261000000 80 0.015745603609017354
1383261000000 81 0.01332206498346922
1383261000000 101 0.003292329017912283
1383261000000 102 0.0028735259562922954
1383261000000 103 0.0033962211735905955

and the last id is 2025 and timestamp is 13833462000000

CodePudding user response:

First convert values of id to index, so possible use lambda function in GroupBy.apply with Series.reindex and Series.interpolate:

f = lambda x: x.reindex(range(x.index.min(), x.index.max()   1)).interpolate()
df = df.set_index('id').groupby('timestamp')['strength'].apply(f).reset_index()

print (df)
        timestamp  id  strength
0   1383260400000  10  0.007802
1   1383260400000  11  0.005010
2   1383260400000  12  0.007624
3   1383260400000  13  0.010238
4   1383260400000  14  0.012852
5   1383260400000  15  0.015466
6   1383260400000  16  0.018080
7   1383260400000  17  0.020693
8   1383260400000  18  0.023307
9   1383260400000  19  0.025921
10  1383260400000  20  0.028535
11  1383260400000  21  0.031149
12  1383260400000  22  0.033763
13  1383260400000  23  0.036377
14  1383260400000  24  0.038991
15  1383260400000  25  0.041605
16  1383260400000  26  0.044218
17  1383260400000  27  0.046832
18  1383260400000  28  0.049446
19  1383260400000  29  0.052060
20  1383260400000  30  0.054674
21  1383260400000  31  0.057288
22  1383260400000  32  0.059902
23  1383260400000  33  0.062516
24  1383260400000  34  0.065129
25  1383260400000  35  0.067743
26  1383260400000  36  0.070357
27  1383260400000  37  0.072971
28  1383260400000  38  0.075585
29  1383260400000  39  0.078199
30  1383260400000  40  0.080813
31  1383260400000  41  0.083427
32  1383260400000  42  0.086041
33  1383260400000  43  0.088654
34  1383260400000  44  0.091268
35  1383260400000  45  0.093882
36  1383260400000  46  0.096496
37  1383260400000  47  0.099110
38  1383260400000  48  0.163029

CodePudding user response:

You can try groupby timestamp then reindex to make the id column continuous and fill the NA

out = (df.groupby('timestamp')
       .apply(lambda g: (g.set_index('id')
                         .reindex(range(g['id'].min(), g['id'].max() 1))
                         .reset_index()
                         .pipe(lambda df: df.fillna({'strength': 'interpolated strength', 'timestamp':g.name}))))
       .reset_index(drop=True))
out['timestamp'] = out['timestamp'].astype(int)
print(out)

    id      timestamp               strength
0   10  1383260400000               0.007802
1   11  1383260400000                0.00501
2   12  1383260400000  interpolated strength
3   13  1383260400000  interpolated strength
4   14  1383260400000  interpolated strength
5   15  1383260400000  interpolated strength
6   16  1383260400000  interpolated strength
7   17  1383260400000  interpolated strength
8   18  1383260400000  interpolated strength
9   19  1383260400000  interpolated strength
10  20  1383260400000  interpolated strength
11  21  1383260400000  interpolated strength
12  22  1383260400000  interpolated strength
13  23  1383260400000  interpolated strength
14  24  1383260400000  interpolated strength
15  25  1383260400000  interpolated strength
16  26  1383260400000  interpolated strength
17  27  1383260400000  interpolated strength
18  28  1383260400000  interpolated strength
19  29  1383260400000  interpolated strength
20  30  1383260400000  interpolated strength
21  31  1383260400000  interpolated strength
22  32  1383260400000  interpolated strength
23  33  1383260400000  interpolated strength
24  34  1383260400000  interpolated strength
25  35  1383260400000  interpolated strength
26  36  1383260400000  interpolated strength
27  37  1383260400000  interpolated strength
28  38  1383260400000  interpolated strength
29  39  1383260400000  interpolated strength
30  40  1383260400000  interpolated strength
31  41  1383260400000  interpolated strength
32  42  1383260400000  interpolated strength
33  43  1383260400000  interpolated strength
34  44  1383260400000  interpolated strength
35  45  1383260400000  interpolated strength
36  46  1383260400000  interpolated strength
37  47  1383260400000                0.09911
38  48  1383260400000               0.163029

CodePudding user response:

Use:

#preparing data
str1 = """timestamp id  strength
1383260400000   10  0.007802251505435157
1383260400000   11  0.0050101566841440105
1383260400000   47  0.09910993935546883
1383260410000   48  0.16302926693340128
1383260410000   60  0.16302926693340128"""
data = [x.split() for x in str1.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])

df['id'] = df['id'].astype(int)
df['strength'] = df['strength'].astype(float)
df['timestamp'] = df['timestamp'].astype(float)


#solution

res = df.groupby('timestamp').agg({'id': lambda x: range(x.min(), x.max() 1)}).explode('id').reset_index().merge(df, on=['timestamp', 'id'], how='outer')
res['strength'] = res['strength'].interpolate()
res

Output:

timestamp   id  strength
0   1.383260e 12    10.0    0.007802
1   1.383260e 12    11.0    0.005010
2   1.383260e 12    12.0    0.009400
3   1.383260e 12    13.0    0.013789
4   1.383260e 12    14.0    0.018178
5   1.383260e 12    15.0    0.022568
6   1.383260e 12    16.0    0.026957
7   1.383260e 12    17.0    0.031347
8   1.383260e 12    18.0    0.035736
9   1.383260e 12    19.0    0.040126
10  1.383260e 12    20.0    0.044515
11  1.383260e 12    21.0    0.048904
12  1.383260e 12    22.0    0.053294
13  1.383260e 12    23.0    0.057683
14  1.383260e 12    24.0    0.062073
15  1.383260e 12    25.0    0.066462
16  1.383260e 12    26.0    0.070851
17  1.383260e 12    27.0    0.075241
18  1.383260e 12    28.0    0.079630
19  1.383260e 12    29.0    0.084020
20  1.383260e 12    30.0    0.088409
21  1.383260e 12    31.0    0.092799
22  1.383260e 12    32.0    0.097188
23  1.383260e 12    33.0    0.101577
24  1.383260e 12    34.0    0.105967
25  1.383260e 12    35.0    0.110356
26  1.383260e 12    36.0    0.114746
27  1.383260e 12    37.0    0.119135
28  1.383260e 12    38.0    0.123524
29  1.383260e 12    39.0    0.127914
30  1.383260e 12    40.0    0.132303
31  1.383260e 12    41.0    0.136693
32  1.383260e 12    42.0    0.141082
33  1.383260e 12    43.0    0.145472
34  1.383260e 12    44.0    0.149861
35  1.383260e 12    45.0    0.154250
36  1.383260e 12    46.0    0.158640
37  1.383260e 12    48.0    0.163029
38  1.383260e 12    49.0    0.157703
39  1.383260e 12    50.0    0.152376
40  1.383260e 12    51.0    0.147049
41  1.383260e 12    52.0    0.141723
42  1.383260e 12    53.0    0.136396
43  1.383260e 12    54.0    0.131070
44  1.383260e 12    55.0    0.125743
45  1.383260e 12    56.0    0.120416
46  1.383260e 12    57.0    0.115090
47  1.383260e 12    58.0    0.109763
48  1.383260e 12    59.0    0.104437
49  1.383260e 12    47.0    0.099110
50  1.383260e 12    60.0    0.163029
  • Related