Home > Mobile >  adding to a dataframe column of overlapping times
adding to a dataframe column of overlapping times

Time:05-17

suppose there are many workers for a business, and all of them work different amount of hours that start and end at different hours of the day.

each day, and you are given a list of each workers' start and end times.

what is the fastest and most efficient way to create a dataframe column that contains the number of workers that will be working at each hour for the day?

df=pd.Dataframe([9,10,11,12,13,14,15,16,17],columns=['business_hrs'])
df['ppl_working']=0

start_times=[8,9,13,12,10]
end_times=[12,13,17,16,13]

this is the first thing i thought of

for s,e in zip(start_times,end_times):
    df.loc[(df['business_hrs']>=s) & (df['business_hrs']<=e),'ppl_working'] =1

intuition tells me there is a much more efficient way to do this, without having to do as much iteration, and would make a difference if there are for example millions of workers

CodePudding user response:

Use list comprehension through the nine rows of business_hrs AND NOT vice versa through potentially millions of rows of ppl working data... This should be performant.

import pandas as pd
df = pd.DataFrame([9,10,11,12,13,14,15,16,17], columns=['business_hrs'])
df1 = pd.DataFrame({'start_times' : [8,9,13,12,10], 'end_times' : [12,13,17,16,13]})
df['ppl working'] = [((df1['start_times'] <= hr) 
                      & (df1['end_times'] > hr)).sum() 
                     for hr in df['business_hrs']]
df
Out[1]: 
   business_hrs  ppl working
0             9            2
1            10            3
2            11            3
3            12            3
4            13            2
5            14            2
6            15            2
7            16            1
8            17            0

CodePudding user response:

A package called enter image description here

To sample the step function you call it (as if it was a method). Wrap this up in a dataframe like this

import pandas as pd

# sampling hours 0 to 19
sample_times = range(20)
pd.DataFrame({"hour":sample_times, "num_workers":sf(sample_times)})

the result

    hour  num_workers
            ...
5      5            0
6      6            0
7      7            0
8      8            1
9      9            2
10    10            3
11    11            3
12    12            3
13    13            2
14    14            2
            ...
  • Related