I have a sorted dataset by timestamps in seconds. However I need to somehow convert it to millisecond accuracy.
Example
dataset = [
# UNIX timestamps with reading data
(0, 0.48499),
(2, 0.48475),
(3, 0.48475),
(3, 0.48473),
(3, 0.48433),
(3, 0.48403),
(3, 0.48403),
(3, 0.48403),
(3, 0.48403),
(3, 0.48403),
(5, 0.48396),
(12, 0.48353),
]
Expected output (roughly)
interpolated = [
# Timestamps with millisecond accuracy
(0.0, 0.48499),
(2.0, 0.48475),
(3.0, 0.48475),
(3.14, 0.48473),
(3.28, 0.48433),
(3.42, 0.48403),
(3.57, 0.48403),
(3.71, 0.48403),
(3.85, 0.48403),
(3.99, 0.48403),
(5.0, 0.48396),
(12.0, 0.48353),
]
I don't have much experience with Pandas
and I've gone through interpolate
and drop_duplicates
but couldn't figure out how to go about this.
I would think this is a common problem so any help appreciated. Ideally I want to spread evenly the numbers.
CodePudding user response:
You can use groupby
and apply
methods. I didn't come up with a specific method like interpolate
in this case, but there might be a more pythonic way.
Code:
import numpy as np
import pandas as pd
# Create a sample dataframe
dataset = [(0, 0.48499), (2, 0.48475), (3, 0.48475), (3, 0.48473), (3, 0.48433), (3, 0.48403), (3, 0.48403), (3, 0.48403), (3, 0.48403), (3, 0.48403), (5, 0.48396), (12, 0.48353)]
df = pd.DataFrame(dataset, columns=['t', 'value'])
# Convert UNIX timestamps into the desired format
df.t = df.groupby('t', group_keys=False).apply(lambda df: df.t np.linspace(0, 1, len(df)))
Output:
t | value |
---|---|
0 | 0.48499 |
2 | 0.48475 |
3 | 0.48475 |
3.14286 | 0.48473 |
3.28571 | 0.48433 |
3.42857 | 0.48403 |
3.57143 | 0.48403 |
3.71429 | 0.48403 |
3.85714 | 0.48403 |
4 | 0.48403 |
5 | 0.48396 |
12 | 0.48353 |
(Input:)
t | value |
---|---|
0 | 0.48499 |
2 | 0.48475 |
3 | 0.48475 |
3 | 0.48473 |
3 | 0.48433 |
3 | 0.48403 |
3 | 0.48403 |
3 | 0.48403 |
3 | 0.48403 |
3 | 0.48403 |
5 | 0.48396 |
12 | 0.48353 |