Given the following DataFrame of pandas in Python:
Displays the switching on and off of 3 light bulbs at different times using date and time objects.
date ID_bulb switch using_time
1 2022-03-27 15:30:21 00:00 1 ON NaT
2 2022-03-29 17:05:21 00:00 1 OFF 2 days 01:35:00
3 2022-04-07 17:05:21 00:00 1 OFF NaT
4 2022-04-06 16:10:21 00:00 2 ON NaT
5 2022-04-07 15:30:21 00:00 2 OFF 0 days 23:20:00
6 2022-02-15 23:10:21 00:00 3 ON NaT
7 2022-02-16 02:10:21 00:00 3 OFF 0 days 04:00:00
8 2022-02-16 02:50:01 00:00 3 ON NaT
9 2022-02-18 10:50:01 00:00 3 OFF 2 days 07:00:00
10 2022-02-04 19:40:21 00:00 4 ON NaT
11 2022-02-06 15:35:21 00:00 4 OFF 1 days 19:55:00
12 2022-02-23 20:10:21 00:00 4 ON NaT
13 2022-02-24 02:10:21 00:00 4 OFF 0 days 10:00:00
14 2022-03-14 12:10:21 00:00 5 ON NaT
15 2022-03-15 00:10:21 00:00 5 ON NaT
16 2022-03-16 05:10:21 00:00 5 OFF 0 days 05:00:00
I want to add a new column, called cost_days
. This column will include only for rows where the variable using_time
is different from NaT
. Information on how many times the light bulb has been on during at least n
hours in a row between a night period defined by start_time
to end_time
.
Example of the resulting DataFrame:
Given for example:
add_costdays_column(df, 5, "22:00:00", "07:00:00")
date ID_bulb switch using_time cost_days
1 2022-03-27 15:30:21 00:00 1 ON NaT 0
2 2022-03-29 17:05:21 00:00 1 OFF 2 days 01:35:00 2
3 2022-04-07 17:05:21 00:00 1 OFF NaT 0
4 2022-04-06 16:10:21 00:00 2 ON NaT 0
5 2022-04-07 15:30:21 00:00 2 OFF 0 days 23:20:00 1
6 2022-02-15 23:10:21 00:00 3 ON NaT 0
7 2022-02-16 02:10:21 00:00 3 OFF 0 days 04:00:00 0
8 2022-02-16 02:50:01 00:00 3 ON NaT 0
9 2022-02-18 10:50:01 00:00 3 OFF 2 days 08:00:00 2
10 2022-02-04 19:40:21 00:00 4 ON NaT 0
11 2022-02-06 15:35:21 00:00 4 OFF 1 days 19:55:00 2
12 2022-02-23 20:10:21 00:00 4 ON NaT 0
13 2022-02-24 02:10:21 00:00 4 OFF 0 days 06:00:00 0
14 2022-03-14 12:10:21 00:00 5 ON NaT 0
15 2022-03-15 00:10:21 00:00 5 ON NaT 0
16 2022-03-16 05:10:21 00:00 5 OFF 0 days 05:00:00 1
Only the row ON before the row OFF with a value other than NaT
in the using_time
column is taken into account for switching on the bulb. This is to simplify the problem, I will adapt it later to the specific case.
CodePudding user response:
Use:
import pandas as pd
df = pd.read_csv('TEST_STACK.csv')
df = df.iloc[:df[df['using_time'].notna()].index[-1] 1]
#solution
g = (df['using_time'].notna()).sort_index(ascending=False).cumsum()
g = (g-max(g)).abs()
import numpy as np
def rounder(x):
v = pd.date_range(list(x)[-2], list(x)[-1], freq='1h')
temp = pd.Series(v, index = v).between_time('22:00', '07:00')
temp = len(temp)/9
return np.floor(temp) if np.mod(temp, 1.0) < 6/9 else np.ceil(temp)/9
temp = df.groupby(g)['date'].apply(lambda x: rounder(x))
df.loc[df[df['using_time'].notna()].index, 'new col']=temp.values
df['new col'] = df['new col'].fillna(0)