Home > Enterprise >  Substract values from two columns where same time (pandas, python)
Substract values from two columns where same time (pandas, python)

Time:07-09

I have a pandas dataFrame with 3 columns of weather data - temperature, time and the name of the weather station.

It looks like this:

Time Station_name Temperature
2022-05-12 22:09:35 00:00 station_a 18.3
2022-05-12 22:09:42 00:00 station_b 18.0

I would like to calculate the temperature difference of station_a from station_b at every same minute (as the time stamps are not exactly equal but precise at minute-level (and there is only one measurement every 10 minutes) in a new column.

Is there a way to do this?

Thanks!

CodePudding user response:

You can use a merge_asof on the two sub-dataframes:

df['Time'] = pd.to_datetime(df['Time'])

out = (pd
   .merge_asof(df[df['Station_name'].eq('station_a')],
               df[df['Station_name'].eq('station_b')],
               on='Time', direction='nearest',
               tolerance=pd.Timedelta('1min'),
               suffixes=('_a', '_b')
              )
   .set_index('Time')
   .eval('diff = Temperature_b - Temperature_a')
   ['diff']
)

output:

Time
2022-05-12 22:09:35 00:00   -0.3
Name: diff, dtype: float64

You can also try to round the times, but it is more risky if one time gets rounded up and the other down:

df['Time'] = pd.to_datetime(df['Time'])

(df
 .assign(Time=df['Time'].dt.round('10min'))
 .pivot('Time', 'Station_name', 'Temperature')
 .eval('diff = station_b - station_a')
)

output:

Station_name               station_a  station_b  diff
Time                                                 
2022-05-12 22:10:00 00:00       18.3       18.0  -0.3

CodePudding user response:

If you have this pandas dataframe

from datetime import datetime
import pandas as pd

data = [{"Time":datetime(2022,5,12,22,9,35), "Station_name":"station_a", "Temperature": 18.3},
        {"Time":datetime(2022,5,12,22,9,42), "Station_name":"station_b", "Temperature": 18.0 },
        {"Time":datetime(2022,5,12,22,10,35), "Station_name":"station_a", "Temperature": 17.3},
        {"Time":datetime(2022,5,12,22,10,42), "Station_name":"station_b", "Temperature": 18.0 }]

df = pd.DataFrame(data)

truncate to minutes: Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

pivot tables / reshape: https://pandas.pydata.org/docs/user_guide/reshaping.html

#truncate to minutes
df["Time_trunc"] = df["Time"].values.astype('<M8[m]')

#Set index (in order to pivot) and pivot (unstack)
df = df.set_index(["Time_trunc",'Station_name'])
df_pivoted = df.unstack()

#Flatten multi-columns
df_new = pd.DataFrame(df_pivoted.to_records())
df_new.columns = ["Time_trunc", "Temp_station_a", "Temp_station_b", "time_station_a", "Time_station_b"]

#Add Diff of temperatures
df_new["DiffAbs"] = abs(df_new["Temp_station_a"]-df_new["Temp_station_b"])

Resulting DataFrame Image

CodePudding user response:

You can use pandas.Series.diff

For example:

df['Temperature_diff'] = df['Temperature'].diff()
  • Related