suppose there are many workers for a business, and all of them work different amount of hours that start and end at different hours of the day.
each day, and you are given a list of each workers' start and end times.
what is the fastest and most efficient way to create a dataframe column that contains the number of workers that will be working at each hour for the day?
df=pd.Dataframe([9,10,11,12,13,14,15,16,17],columns=['business_hrs'])
df['ppl_working']=0
start_times=[8,9,13,12,10]
end_times=[12,13,17,16,13]
this is the first thing i thought of
for s,e in zip(start_times,end_times):
df.loc[(df['business_hrs']>=s) & (df['business_hrs']<=e),'ppl_working'] =1
intuition tells me there is a much more efficient way to do this, without having to do as much iteration, and would make a difference if there are for example millions of workers
CodePudding user response:
Use list comprehension through the nine rows of business_hrs
AND NOT vice versa through potentially millions of rows of ppl working data... This should be performant.
import pandas as pd
df = pd.DataFrame([9,10,11,12,13,14,15,16,17], columns=['business_hrs'])
df1 = pd.DataFrame({'start_times' : [8,9,13,12,10], 'end_times' : [12,13,17,16,13]})
df['ppl working'] = [((df1['start_times'] <= hr)
& (df1['end_times'] > hr)).sum()
for hr in df['business_hrs']]
df
Out[1]:
business_hrs ppl working
0 9 2
1 10 3
2 11 3
3 12 3
4 13 2
5 14 2
6 15 2
7 16 1
8 17 0
CodePudding user response:
To sample the step function you call it (as if it was a method). Wrap this up in a dataframe like this
import pandas as pd
# sampling hours 0 to 19
sample_times = range(20)
pd.DataFrame({"hour":sample_times, "num_workers":sf(sample_times)})
the result
hour num_workers
...
5 5 0
6 6 0
7 7 0
8 8 1
9 9 2
10 10 3
11 11 3
12 12 3
13 13 2
14 14 2
...