Creating a time column in dataframe with sparse data-CodePudding

I have a large dataframe. The first column is a "time" column and is properly filled in all rows. The rows are 0.1sec apart. There are hours, minutes, and seconds columns which are mostly 'nan' but have sparse data with the relevant hour, minute or second data. The rows which contain the hours, minutes and seconds do not line up with each other (some rows have only a "seconds" value, others have an "hour" value and most are all nan).

How can I create a column with a "time of day" based on the time in the first column plus an initial time calculated from the other columns.

See the example below (but with a lot more nans).

Time	Hours	Min	Sec	data1	data2	desired result
0.0	nan	nan	nan	value1	value2	10:05:05.0
0.1	10	nan	nan	value1	value2	10:06:05.1
0.2	nan	nan	5	value1	value2	10:06:05.2
0.3	nan	nan	nan	value1	value2	10:06:05.3
0.4	nan	nan	nan	value1	value2	10:06:05.4
0.5	nan	6	nan	value1	value2	10:06:05.5

CodePudding user response：

If I understand you correctly, you can ffill/bfill the relevant columns, convert them to timedelta objects and format them afterwards:

def format(x):
    s = x.seconds
    ms = int(x.microseconds / 1000)
    return "{:02}:{:02}:{:02}.{:03}".format(
        s // 3600, s % 3600 // 60, s % 60, ms
    )


df[["Hours", "Min", "Sec"]] = df[["Hours", "Min", "Sec"]].ffill().bfill()

df["desired result"] = pd.to_timedelta(
    df.Hours.astype(int).astype(str)
      ":"
      df.Min.astype(int).astype(str)
      ":"
      df.Sec.astype(int).astype(str)
)   pd.to_timedelta(df["Time"], unit="s")

df["desired result"] = df["desired result"].apply(format)

print(df)

Prints:

   Time  Hours  Min  Sec   data1   data2 desired result
0   0.0   10.0  6.0  5.0  value1  value2   10:06:05.000
1   0.1   10.0  6.0  5.0  value1  value2   10:06:05.100
2   0.2   10.0  6.0  5.0  value1  value2   10:06:05.200
3   0.3   10.0  6.0  5.0  value1  value2   10:06:05.300
4   0.4   10.0  6.0  5.0  value1  value2   10:06:05.400
5   0.5   10.0  6.0  5.0  value1  value2   10:06:05.500