I have a pandas dataframe structured as follows:
TimeStamp
2022-12-30 10:31:58.483700 00:00 1 FixType 4 4.000000e 00
2022-12-30 10:31:58.483700 00:00 1 Satellites 11 1.100000e 01
2022-12-30 10:31:58.484150 00:00 2 TimeConfirmed 0 0.000000e 00
2022-12-30 10:31:58.484150 00:00 2 Epoch 63797521999 1.641638e 09
2022-12-30 10:31:58.484150 00:00 2 TimeValid 1 1.000000e 00
... ... ... ... ...
2022-12-30 10:54:32.714050 00:00 9 AngularRateZ 1020 -1.000000e 00
2022-12-30 10:54:32.714050 00:00 9 AccelerationY 513 1.250000e-01
2022-12-30 10:54:32.714050 00:00 9 AccelerationZ 594 1.025000e 01
2022-12-30 10:54:32.714050 00:00 9 AngularRateX 1025 2.500000e-01
2022-12-30 10:54:32.714050 00:00 9 ImuValid 1 1.000000e 00
[973528 rows x 4 columns]
I need to get it into the following structure, while also resampling it to a specific frequency (e.g. 1S
):
FixType Satellites ... AngularRateZ ImuValid
TimeStamp ...
2022-12-30 10:31:59 00:00 4.0 11.0 ... NaN NaN
2022-12-30 10:32:00 00:00 4.0 11.0 ... -1.00 1.0
2022-12-30 10:32:01 00:00 4.0 12.0 ... -1.00 1.0
2022-12-30 10:32:02 00:00 4.0 12.0 ... -1.00 1.0
2022-12-30 10:32:03 00:00 4.0 12.0 ... -1.00 1.0
... ... ... ... ... ...
2022-12-30 10:54:28 00:00 4.0 13.0 ... -1.00 1.0
2022-12-30 10:54:29 00:00 4.0 14.0 ... -1.00 1.0
2022-12-30 10:54:30 00:00 4.0 14.0 ... -0.75 1.0
2022-12-30 10:54:31 00:00 4.0 14.0 ... -1.00 1.0
2022-12-30 10:54:32 00:00 4.0 14.0 ... -1.00 1.0
[1354 rows x 39 columns]
Currently I achieve this via below code:
def restructure_data(df_phys, res):
import pandas as pd
df_phys_join = pd.DataFrame({"TimeStamp": []})
if not df_phys.empty:
for message, df_phys_message in df_phys.groupby("CAN ID"):
for signal, data in df_phys_message.groupby("Signal"):
col_name = signal
df_phys_join = pd.merge_ordered(
df_phys_join,
data["Physical Value"].rename(col_name).resample(res).ffill().dropna(),
on="TimeStamp",
fill_method="none",
).set_index("TimeStamp")
return df_phys_join
This works, but it seems inefficient. I wonder if there is a smarter and perhaps more pythonic way to achieve a similar result?
CodePudding user response:
It's a little bit hard to tell whether this would work without more data/info (Are there any duplicate time/new column combinations that need to be handled?), but you might be able to get away with a one liner:
df.pivot_table(values="col4", index=pd.Grouper(freq="S", key="TimeStamp"), columns="col3")
filling in the correct column names in your original df
for values=
and columns=
.