creating order queue based on starttime and endtime in pandas python-CodePudding

I have a pandas dataframe that contains four date columns, a starttime and an end time and date column that defines a range. I'd like to be able to collectively create a queue count for all time and date across all rows in the data frame, as defined by these columns.

date start

 1. date   starttime   endtime  storeid
 2. 2/3/20  6:20 pm    7:20 pm   12231
 3. 2/3/20   6:25pm    7:25 pm  12231
 4. 2/3/20   6:29pm    7:40 pm   12231
 5. 2/3/20   7:21pm    7:59pm    12231
6.  2/3/20   6:21pm    7:21 pm   12232

I wants to generate queue based column that will give me how many queue are there before the current one in below way:

date start

 1. date   starttime   endtime  storeid queue 
 2. 2/3/20  6:20 pm    7:20 pm  12231     1
 3. 2/3/20   6:25pm    7:25 pm  12231     2
 4. 2/3/20   6:29pm    7:40 pm  12231     3
 5. 2/3/20   7:21pm    7:59 pm  12231     2
 6. 2/3/20   6:21pm    7:21 pm  12232     1

I am new to this and any help would be really appreciated.

CodePudding user response：

The size of a queue at any point is a step function. There is a package called

You can also add them together to create the combined queue

stepfunctions[12231]   stepfunctions[12232]
# or
stepfunctions.sum()

You can sample the queue size at any point(s) you like. It seems like you want the queue size sampled at each start time. For store 12231 you can do this like so

stepfunctions[12231](data["starts"])

which gives you

array([1, 2, 3, 3], dtype=int64)

In order to get add this column to your dataframe it will be easier to do this in the same step as creating the step function in the groupby-apply.

So the entire solution (after imports) is

def queue_count_sample(df):
    sf = sc.Stairs(df, "start", "end")
    df["queue"] = sf(df["start"])
    return df

data.groupby("storeid").apply(queue_count_sample)