I want to add column to the dataframe with values(comments) based on the Timestamp, grouped per each day. I made it as per example below, but... is there any other more "pandonic" way? maybe one-liner or at least close to it?
Example data frame (the actual has much more dates and more different values):
import pandas as pd
data = {"Values": ["absd","abse", "dara", "absd","abse", "dara"],
"Date": ["2022-05-25","2022-05-25","2022-05-25", "2022-05-26","2022-05-26","2022-05-26"],
"Timestamp": ["2022-05-25 08:00:00", "2022-05-25 11:30:00", "2022-05-25 20:25:00",
"2022-05-26 09:00:00", "2022-05-26 13:40:00", "2022-05-26 19:15:00"]}
df = pd.DataFrame(data)
df.Timestamp = pd.to_datetime(df.Timestamp, format='%Y-%m-%d %H:%M:%S')
df.Date = pd.to_datetime(df.Date, format='%Y-%m-%d')
df out:
Values Date Timestamp
0 absd 2022-05-25 2022-05-25 08:00:00
1 abse 2022-05-25 2022-05-25 11:30:00
2 dara 2022-05-25 2022-05-25 20:25:00
3 absd 2022-05-26 2022-05-26 09:00:00
4 abse 2022-05-26 2022-05-26 13:40:00
5 dara 2022-05-26 2022-05-26 19:15:00
the end result I want is:
Values Date Period Datetime
0 absd 2022-05-25 Start 2022-05-25 08:00:00
1 abse 2022-05-25 Start 2022-05-25 08:00:00
2 dara 2022-05-25 Start 2022-05-25 08:00:00
3 dara 2022-05-25 Mid 2022-05-25 11:30:00
4 abse 2022-05-25 Mid 2022-05-25 11:30:00
5 absd 2022-05-25 Mid 2022-05-25 11:30:00
6 dara 2022-05-25 End 2022-05-25 20:25:00
7 abse 2022-05-25 End 2022-05-25 20:25:00
8 absd 2022-05-25 End 2022-05-25 20:25:00
9 dara 2022-05-26 Start 2022-05-26 09:00:00
10 abse 2022-05-26 Start 2022-05-26 09:00:00
11 absd 2022-05-26 Start 2022-05-26 09:00:00
12 absd 2022-05-26 Mid 2022-05-26 13:40:00
13 abse 2022-05-26 Mid 2022-05-26 13:40:00
14 dara 2022-05-26 Mid 2022-05-26 13:40:00
15 absd 2022-05-26 End 2022-05-26 19:15:00
16 abse 2022-05-26 End 2022-05-26 19:15:00
17 dara 2022-05-26 End 2022-05-26 19:15:00
my working approach is below:
df["Start"] = df["Timestamp"].groupby(df["Date"]).transform("min")
df["End"] = df["Timestamp"].groupby(df["Date"]).transform("max")
df["Mid"] = df["Timestamp"].groupby(df["Date"]).transform("median")
df1 = df.melt(id_vars = ["Values","Date"],
var_name="Period",value_name="Datetime").sort_values("Datetime")
df1 = df1[df1.Period != "Timestamp"].reset_index(drop=True)
CodePudding user response:
Using only pandas:
(
df['Timestamp'].groupby(df['Date']).agg(['min','median','max']).merge(df, on='Date')
.melt(id_vars=['Values','Date'], var_name='Period', value_name='Datetime')
.query('Period!="Timestamp"')
.sort_values('Datetime')
)
Output:
Values Date Period Datetime
0 absd 2022-05-25 min 2022-05-25 08:00:00
1 abse 2022-05-25 min 2022-05-25 08:00:00
2 dara 2022-05-25 min 2022-05-25 08:00:00
7 abse 2022-05-25 median 2022-05-25 11:30:00
6 absd 2022-05-25 median 2022-05-25 11:30:00
8 dara 2022-05-25 median 2022-05-25 11:30:00
12 absd 2022-05-25 max 2022-05-25 20:25:00
13 abse 2022-05-25 max 2022-05-25 20:25:00
14 dara 2022-05-25 max 2022-05-25 20:25:00
4 abse 2022-05-26 min 2022-05-26 09:00:00
3 absd 2022-05-26 min 2022-05-26 09:00:00
5 dara 2022-05-26 min 2022-05-26 09:00:00
9 absd 2022-05-26 median 2022-05-26 13:40:00
10 abse 2022-05-26 median 2022-05-26 13:40:00
11 dara 2022-05-26 median 2022-05-26 13:40:00
16 abse 2022-05-26 max 2022-05-26 19:15:00
15 absd 2022-05-26 max 2022-05-26 19:15:00
17 dara 2022-05-26 max 2022-05-26 19:15:00
CodePudding user response:
From the end result dataframe, it looks like you need a combination of all the columns (well, a combination of the Values
column and the ('Date', Timestamp') columns).
One option is with complete from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.assign(Period = ['Start', 'Mid', 'End'] * 2)
.complete(('Date', 'Timestamp', 'Period'), 'Values')
)
Values Date Timestamp Period
0 absd 2022-05-25 2022-05-25 08:00:00 Start
1 abse 2022-05-25 2022-05-25 08:00:00 Start
2 dara 2022-05-25 2022-05-25 08:00:00 Start
3 absd 2022-05-25 2022-05-25 11:30:00 Mid
4 abse 2022-05-25 2022-05-25 11:30:00 Mid
5 dara 2022-05-25 2022-05-25 11:30:00 Mid
6 absd 2022-05-25 2022-05-25 20:25:00 End
7 abse 2022-05-25 2022-05-25 20:25:00 End
8 dara 2022-05-25 2022-05-25 20:25:00 End
9 absd 2022-05-26 2022-05-26 09:00:00 Start
10 abse 2022-05-26 2022-05-26 09:00:00 Start
11 dara 2022-05-26 2022-05-26 09:00:00 Start
12 absd 2022-05-26 2022-05-26 13:40:00 Mid
13 abse 2022-05-26 2022-05-26 13:40:00 Mid
14 dara 2022-05-26 2022-05-26 13:40:00 Mid
15 absd 2022-05-26 2022-05-26 19:15:00 End
16 abse 2022-05-26 2022-05-26 19:15:00 End
17 dara 2022-05-26 2022-05-26 19:15:00 End
CodePudding user response:
Another pandas only method:
out = (df.groupby('Date')
.agg({'Timestamp':['min', 'median', 'max'], 'Values':list})
.explode(('Values', 'list'))
.droplevel(0, axis=1)
.rename(columns={'list':'Values'})
.reset_index()
.melt(['Values', 'Date'], var_name='Period', value_name='Datetime')
.sort_values('Datetime', ignore_index=True))
print(out)
Output:
Values Date Period Datetime
0 absd 2022-05-25 min 2022-05-25 08:00:00
1 abse 2022-05-25 min 2022-05-25 08:00:00
2 dara 2022-05-25 min 2022-05-25 08:00:00
3 abse 2022-05-25 median 2022-05-25 11:30:00
4 absd 2022-05-25 median 2022-05-25 11:30:00
5 dara 2022-05-25 median 2022-05-25 11:30:00
6 absd 2022-05-25 max 2022-05-25 20:25:00
7 abse 2022-05-25 max 2022-05-25 20:25:00
8 dara 2022-05-25 max 2022-05-25 20:25:00
9 abse 2022-05-26 min 2022-05-26 09:00:00
10 absd 2022-05-26 min 2022-05-26 09:00:00
11 dara 2022-05-26 min 2022-05-26 09:00:00
12 absd 2022-05-26 median 2022-05-26 13:40:00
13 abse 2022-05-26 median 2022-05-26 13:40:00
14 dara 2022-05-26 median 2022-05-26 13:40:00
15 abse 2022-05-26 max 2022-05-26 19:15:00
16 absd 2022-05-26 max 2022-05-26 19:15:00
17 dara 2022-05-26 max 2022-05-26 19:15:00