I am passing a function to a groupby
operation which returns a list
which transforms to series in apply
and finally assign the series back to the main dataframe.
df_SOT = pd.DataFrame({'Lane': {26055: 'L2', 26056: 'L2', 26057: 'L2', 26058: 'L2', 26059: 'L2', 25972: 'L1', 25973: 'L1', 25974: 'L1', 25975: 'L1', 25976: 'L1'}, 'Carrier SCAC': {26055: 'JNJR', 26056: 'WOSQ', 26057: 'BGME', 26058: 'ITSB', 26059: 'UCSB', 25972: 'BGME', 25973: 'SCNN', 25974: 'XPOL', 25975: 'SJRG', 25976: 'MTRK'}, 'Annual Volume': {26055: 5604.0, 26056: 5604.0, 26057: 5604.0, 26058: 5604.0, 26059: 5604.0, 25972: 4917.0, 25973: 4917.0, 25974: 4917.0, 25975: 4917.0, 25976: 4917.0}, 'Annual Capacity': {26055: 260.0, 26056: 1300.0, 26057: 2704.0, 26058: 2080.0, 26059: 4368.0, 25972: 5408.0, 25973: 3380.0, 25974: 4940.0, 25975: 156.0, 25976: 4940.0}})
Function
def allocation(df_alloc):
Annual_Volume = df_alloc['Annual Volume']
Annual_Capacity = df_alloc['Annual Capacity']
Allocation = []
Cum_Capacity = 0
for idx in df_alloc.index:
Allocate = (min(0.5*Annual_Volume[idx],Annual_Capacity[idx],Annual_Volume[idx]-Cum_Capacity))
Cum_Capacity =Allocate
Allocation.append(Allocate)
return Allocation
Groupby operation:
df_SOT.groupby('Lane').apply(allocation)
>>>Ouput
Lane
L1 [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2 [260.0, 1300.0, 2704.0, 1340.0, 0.0]
dtype: object
df_SOT['Allocation] = df_SOT.groupby('Lane').apply(allocation)
But it returns as NaN. Not sure what I am doing wrong here.
Lane | Carrier SCAC | Annual Volume | Annual Capacity | Allocation | |
---|---|---|---|---|---|
26055 | L2 | JNJR | 5604.0 | 260.0 | NaN |
26056 | L2 | WOSQ | 5604.0 | 1300.0 | NaN |
26057 | L2 | BGME | 5604.0 | 2704.0 | NaN |
26058 | L2 | ITSB | 5604.0 | 2080.0 | NaN |
26059 | L2 | UCSB | 5604.0 | 4368.0 | NaN |
25972 | L1 | BGME | 4917.0 | 5408.0 | NaN |
25973 | L1 | SCNN | 4917.0 | 3380.0 | NaN |
25974 | L1 | XPOL | 4917.0 | 4940.0 | NaN |
25975 | L1 | SJRG | 4917.0 | 156.0 | NaN |
25976 | L1 | MTRK | 4917.0 | 4940.0 | NaN |
I wanted the NaN
column to reflect the series values in column as came in the groupby
operation:
L1 [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2 [260.0, 1300.0, 2704.0, 1340.0, 0.0]
Or, if there is a better way to go about that, please let me know.
CodePudding user response:
One alternative is to make allocation
return a pandas Series with the correct index instead of a simple list, and setting group_keys
to False
in the groupby
:
def allocation(df_alloc):
Annual_Volume = df_alloc["Annual Volume"]
Annual_Capacity = df_alloc["Annual Capacity"]
Allocation = []
Cum_Capacity = 0
for idx in df_alloc.index:
Allocate = min(
0.5 * Annual_Volume[idx],
Annual_Capacity[idx],
Annual_Volume[idx] - Cum_Capacity,
)
Cum_Capacity = Allocate
Allocation.append(Allocate)
return pd.Series(Allocation, index=df_alloc.index)
df_SOT["Allocation"] = df_SOT.groupby("Lane", group_keys=False).apply(allocation)