I have a large dataframe. The first column is a "time" column and is properly filled in all rows. The rows are 0.1sec apart. There are hours, minutes, and seconds columns which are mostly 'nan' but have sparse data with the relevant hour, minute or second data. The rows which contain the hours, minutes and seconds do not line up with each other (some rows have only a "seconds" value, others have an "hour" value and most are all nan).
How can I create a column with a "time of day" based on the time in the first column plus an initial time calculated from the other columns.
See the example below (but with a lot more nans).
Time | Hours | Min | Sec | data1 | data2 | desired result |
---|---|---|---|---|---|---|
0.0 | nan | nan | nan | value1 | value2 | 10:05:05.0 |
0.1 | 10 | nan | nan | value1 | value2 | 10:06:05.1 |
0.2 | nan | nan | 5 | value1 | value2 | 10:06:05.2 |
0.3 | nan | nan | nan | value1 | value2 | 10:06:05.3 |
0.4 | nan | nan | nan | value1 | value2 | 10:06:05.4 |
0.5 | nan | 6 | nan | value1 | value2 | 10:06:05.5 |
CodePudding user response:
If I understand you correctly, you can ffill
/bfill
the relevant columns, convert them to timedelta
objects and format them afterwards:
def format(x):
s = x.seconds
ms = int(x.microseconds / 1000)
return "{:02}:{:02}:{:02}.{:03}".format(
s // 3600, s % 3600 // 60, s % 60, ms
)
df[["Hours", "Min", "Sec"]] = df[["Hours", "Min", "Sec"]].ffill().bfill()
df["desired result"] = pd.to_timedelta(
df.Hours.astype(int).astype(str)
":"
df.Min.astype(int).astype(str)
":"
df.Sec.astype(int).astype(str)
) pd.to_timedelta(df["Time"], unit="s")
df["desired result"] = df["desired result"].apply(format)
print(df)
Prints:
Time Hours Min Sec data1 data2 desired result
0 0.0 10.0 6.0 5.0 value1 value2 10:06:05.000
1 0.1 10.0 6.0 5.0 value1 value2 10:06:05.100
2 0.2 10.0 6.0 5.0 value1 value2 10:06:05.200
3 0.3 10.0 6.0 5.0 value1 value2 10:06:05.300
4 0.4 10.0 6.0 5.0 value1 value2 10:06:05.400
5 0.5 10.0 6.0 5.0 value1 value2 10:06:05.500