I have a pandas dataFrame with 3 columns of weather data - temperature, time and the name of the weather station.
It looks like this:
Time | Station_name | Temperature |
---|---|---|
2022-05-12 22:09:35 00:00 | station_a | 18.3 |
2022-05-12 22:09:42 00:00 | station_b | 18.0 |
I would like to calculate the temperature difference of station_a from station_b at every same minute (as the time stamps are not exactly equal but precise at minute-level (and there is only one measurement every 10 minutes) in a new column.
Is there a way to do this?
Thanks!
CodePudding user response:
You can use a merge_asof
on the two sub-dataframes:
df['Time'] = pd.to_datetime(df['Time'])
out = (pd
.merge_asof(df[df['Station_name'].eq('station_a')],
df[df['Station_name'].eq('station_b')],
on='Time', direction='nearest',
tolerance=pd.Timedelta('1min'),
suffixes=('_a', '_b')
)
.set_index('Time')
.eval('diff = Temperature_b - Temperature_a')
['diff']
)
output:
Time
2022-05-12 22:09:35 00:00 -0.3
Name: diff, dtype: float64
You can also try to round
the times, but it is more risky if one time gets rounded up and the other down:
df['Time'] = pd.to_datetime(df['Time'])
(df
.assign(Time=df['Time'].dt.round('10min'))
.pivot('Time', 'Station_name', 'Temperature')
.eval('diff = station_b - station_a')
)
output:
Station_name station_a station_b diff
Time
2022-05-12 22:10:00 00:00 18.3 18.0 -0.3
CodePudding user response:
If you have this pandas dataframe
from datetime import datetime
import pandas as pd
data = [{"Time":datetime(2022,5,12,22,9,35), "Station_name":"station_a", "Temperature": 18.3},
{"Time":datetime(2022,5,12,22,9,42), "Station_name":"station_b", "Temperature": 18.0 },
{"Time":datetime(2022,5,12,22,10,35), "Station_name":"station_a", "Temperature": 17.3},
{"Time":datetime(2022,5,12,22,10,42), "Station_name":"station_b", "Temperature": 18.0 }]
df = pd.DataFrame(data)
truncate to minutes: Truncate `TimeStamp` column to hour precision in pandas `DataFrame`
pivot tables / reshape: https://pandas.pydata.org/docs/user_guide/reshaping.html
#truncate to minutes
df["Time_trunc"] = df["Time"].values.astype('<M8[m]')
#Set index (in order to pivot) and pivot (unstack)
df = df.set_index(["Time_trunc",'Station_name'])
df_pivoted = df.unstack()
#Flatten multi-columns
df_new = pd.DataFrame(df_pivoted.to_records())
df_new.columns = ["Time_trunc", "Temp_station_a", "Temp_station_b", "time_station_a", "Time_station_b"]
#Add Diff of temperatures
df_new["DiffAbs"] = abs(df_new["Temp_station_a"]-df_new["Temp_station_b"])
CodePudding user response:
You can use pandas.Series.diff
For example:
df['Temperature_diff'] = df['Temperature'].diff()