Home > Enterprise >  How to calculate the number of charging sessions in my data?
How to calculate the number of charging sessions in my data?

Time:11-17

I have a data set that looks like this:

Timestamp Cumulative Energy (kWh) Charging?
2022-08-19 05:45:00 24.9 1
2022-08-19 06:00:00 44.7 1
2022-08-19 06:15:00 53.1 1
2022-08-19 06:30:00 0 0

And so on. The data set represents the usage of an EV charger for a couple weeks. I want to be able to calculate the number of sessions total and the average energy withdrawn per charging session. Each charging session varies, some are an hour long, some less, some more. Since the dataset provides the cumulative energy, I thought that ways to go about this would be to group consecutive sessions (Charging = 1) identify the largest value for Cumulative Energy (kWh) and commit these values to a dictionary which I can then use to calculate the total number of sessions and the average cum. energy of each session. I'm unsure of how to go about writing this in Python though. Any help would be greatly appreciated!

Update: I did the following:

result = (
    evdata.groupby(["Charging?", (evdata['Charging?'] != evdata['Charging?'].shift()).cumsum()], sort=False)
    .size()
    .reset_index(level=1, drop=True)
)
- -
0 1707
1 1
0 43
1 3
0 38
1 4

And so on. So we've managed to get the number of charging and non-charging sessions. But on the right-hand column we see the number of 15-minute charging sessions when I would ideally like to see the maximum cumulative energy (kWh) for that group?

CodePudding user response:

I copied the first three rows at the bottom to check the solution. hene two rows in the result

Please note I'm still not clear on how you like the dictionary to look like, i.e, what will be the key, I understand the value

# identify the consecutive charging session
# take diff of two consecutive rows, first row will be Nan, so make it -1
# and take absolution value to do a cumsum (see intermediate result below)

# drop duplicates based on seq while keeping last

df2=df.assign(seq=df['Charging?'].diff().fillna(-1).abs().cumsum()).drop_duplicates(subset=['seq'], keep='last')


# keep only rows where charging is 1
out=df2.loc[df2['Charging?'].eq(1)]['Cumulative Energy (kWh)']

out

# RESULT

2    53.1
6    53.1
Name: Cumulative Energy (kWh), dtype: float64

Intermediate result

df['Charging?'].diff().fillna(-1).abs().cumsum()
0    1.0
1    1.0
2    1.0
3    2.0
4    3.0
5    3.0
6    3.0
Name: Charging?, dtype: float64

CodePudding user response:

Not my favorite solution, since it utilizes looping, but I believe this works for you


import numpy as np
import pandas as pd

df = pd.DataFrame( # sample df
[
['2022-08-19 05:45:00', 24.9,   1],
['2022-08-19 06:00:00', 44.7,   1],
['2022-08-19 06:15:00', 53.1,   1],
['2022-08-19 06:30:00'  ,0, 0],
 ['2022-08-19 05:45:00',    10, 1],
['2022-08-19 06:00:00', 20, 1],
['2022-08-19 06:15:00', 10, 1],
['2022-08-19 06:30:00'  ,0, 0],
  ['2022-08-19 05:45:00',   30, 1],
['2022-08-19 06:00:00', 30, 1],
['2022-08-19 06:15:00', 30, 1],
['2022-08-19 06:30:00'  ,0, 0]
]
)
sessionid=1 # init session id
df[3] = 0 # set default
for i in np.arange(0,df.shape[0]-1):
  
  if i == 0: # first session id
    df.iloc[i,3] = sessionid
  
  if df.iloc[i,2] ==0: # if we are at end of session
    sessionid  =1

  df.iloc[i 1,3] = sessionid # set the session id of the next record to current

print(df.loc[df[1]!=0].groupby([3])[1].mean()) # exclude all 0 values
print(df.loc[df[1]!=0].groupby([3])[1].max())
print(df.loc[df[1]!=0].groupby([3])[1].min())
print(df.loc[df[1]!=0].groupby([3])[1].std())

Here is your output

3
1    40.900000
2    13.333333
3    30.000000
Name: 1, dtype: float64
3
1    53.1
2    20.0
3    30.0
Name: 1, dtype: float64
3
1    24.9
2    10.0
3    30.0
Name: 1, dtype: float64
3
1    14.478950
2     5.773503
3     0.000000
Name: 1, dtype: float64
  • Related