Home > Software engineering >  Pandas: Subtracting dates in columns, appending the difference
Pandas: Subtracting dates in columns, appending the difference

Time:11-15

I was wondering if there is any way to select dates and subtract them from each other to get a difference. The difference I will be obtaining won't be in days, but in hours and minutes.

This difference will also be variable depending on the day, as I want the difference of each day, beginning and end, subtracted...

below is the dataframe I am working with:

                     OfficeTemp  OutdoorTemp  SolarDiffuseRate  
DateTime                                                         
2006-01-01 07:15:00   19.915275       0.8125             0.000   
2006-01-01 07:30:00   20.463506       0.8125             0.000   
2006-01-01 07:45:00   20.885112       0.8125             0.000   
2006-01-01 08:00:00   21.499246       0.8125             0.000   
2006-01-02 07:15:00   20.463326      11.5125             0.000   
2006-01-02 07:30:00   21.122635      11.5125             0.000   
2006-01-03 07:15:00   20.224612       6.9625             0.000   
2006-01-03 07:30:00   20.820027       6.9625             0.000   
2006-01-03 07:45:00   21.272505       6.9625             0.000   
2006-01-04 07:15:00   20.007434       3.0625             0.000   
2006-01-04 07:30:00   20.564662       3.0625             0.000   
2006-01-04 07:45:00   20.991727       3.0625             0.000   
2006-01-05 07:15:00   20.046861       8.0000             0.000   
2006-01-05 07:30:00   20.592663       8.0000             0.000   
2006-01-05 07:45:00   21.023338       8.0000             0.000   
2006-01-06 09:00:00   17.527457       3.8875            31.875   
2006-01-06 09:15:00   17.588175       4.7500            73.875   
2006-01-06 09:30:00   17.638827       4.7500            73.875   

The index column is the date time column, and as you can see, the number of samples per day differs despite them all starting at the same time, so the time difference can vary. Some are 45 minutes, whilst others are more or less.

How would I calculate the difference per day, and append it to a Difference column?

CodePudding user response:

This works:

df['diff'] = df.groupby(df['DateTime'].dt.day) \
                        ['DateTime'] \
                        .transform(lambda x: (x.max()-x.min()).seconds/60)
print(df)

output:

              DateTime  OfficeTemp  OutdoorTemp  SolarDiffuseRate  diff
0  2006-01-01 07:15:00    19915275       0.8125               0.0  45.0
1  2006-01-01 07:30:00    20463506       0.8125               0.0  45.0
2  2006-01-01 07:45:00    20885112       0.8125               0.0  45.0
3  2006-01-01 08:00:00    21499246       0.8125               0.0  45.0
4  2006-01-02 07:15:00    20463326  115125.0000               0.0  15.0
5  2006-01-02 07:30:00    21122635  115125.0000               0.0  15.0
6  2006-01-03 07:15:00    20224612   69625.0000               0.0  30.0
7  2006-01-03 07:30:00    20820027   69625.0000               0.0  30.0
8  2006-01-03 07:45:00    21272505   69625.0000               0.0  30.0
9  2006-01-04 07:15:00    20007434   30625.0000               0.0  30.0
10 2006-01-04 07:30:00    20564662   30625.0000               0.0  30.0
11 2006-01-04 07:45:00    20991727   30625.0000               0.0  30.0
12 2006-01-05 07:15:00    20046861   80000.0000               0.0  30.0
13 2006-01-05 07:30:00    20592663   80000.0000               0.0  30.0
14 2006-01-05 07:45:00    21023338   80000.0000               0.0  30.0
15 2006-01-06 09:00:00    17527457   38875.0000           31875.0  30.0
16 2006-01-06 09:15:00    17588175   47500.0000           73875.0  30.0
17 2006-01-06 09:30:00    17638827   47500.0000           73875.0  30.0
  • Related