Home > Back-end >  Creating a time column in dataframe with sparse data
Creating a time column in dataframe with sparse data

Time:08-12

I have a large dataframe. The first column is a "time" column and is properly filled in all rows. The rows are 0.1sec apart. There are hours, minutes, and seconds columns which are mostly 'nan' but have sparse data with the relevant hour, minute or second data. The rows which contain the hours, minutes and seconds do not line up with each other (some rows have only a "seconds" value, others have an "hour" value and most are all nan).

How can I create a column with a "time of day" based on the time in the first column plus an initial time calculated from the other columns.

See the example below (but with a lot more nans).

Time Hours Min Sec data1 data2 desired result
0.0 nan nan nan value1 value2 10:05:05.0
0.1 10 nan nan value1 value2 10:06:05.1
0.2 nan nan 5 value1 value2 10:06:05.2
0.3 nan nan nan value1 value2 10:06:05.3
0.4 nan nan nan value1 value2 10:06:05.4
0.5 nan 6 nan value1 value2 10:06:05.5

CodePudding user response:

If I understand you correctly, you can ffill/bfill the relevant columns, convert them to timedelta objects and format them afterwards:

def format(x):
    s = x.seconds
    ms = int(x.microseconds / 1000)
    return "{:02}:{:02}:{:02}.{:03}".format(
        s // 3600, s % 3600 // 60, s % 60, ms
    )


df[["Hours", "Min", "Sec"]] = df[["Hours", "Min", "Sec"]].ffill().bfill()

df["desired result"] = pd.to_timedelta(
    df.Hours.astype(int).astype(str)
      ":"
      df.Min.astype(int).astype(str)
      ":"
      df.Sec.astype(int).astype(str)
)   pd.to_timedelta(df["Time"], unit="s")

df["desired result"] = df["desired result"].apply(format)

print(df)

Prints:

   Time  Hours  Min  Sec   data1   data2 desired result
0   0.0   10.0  6.0  5.0  value1  value2   10:06:05.000
1   0.1   10.0  6.0  5.0  value1  value2   10:06:05.100
2   0.2   10.0  6.0  5.0  value1  value2   10:06:05.200
3   0.3   10.0  6.0  5.0  value1  value2   10:06:05.300
4   0.4   10.0  6.0  5.0  value1  value2   10:06:05.400
5   0.5   10.0  6.0  5.0  value1  value2   10:06:05.500
  • Related