Assign a Groupby operated series to dataframe-CodePudding

I am passing a function to a groupby operation which returns a list which transforms to series in apply and finally assign the series back to the main dataframe.

df_SOT = pd.DataFrame({'Lane': {26055: 'L2', 26056: 'L2', 26057: 'L2', 26058: 'L2', 26059: 'L2', 25972: 'L1', 25973: 'L1', 25974: 'L1', 25975: 'L1', 25976: 'L1'}, 'Carrier SCAC': {26055: 'JNJR', 26056: 'WOSQ', 26057: 'BGME', 26058: 'ITSB', 26059: 'UCSB', 25972: 'BGME', 25973: 'SCNN', 25974: 'XPOL', 25975: 'SJRG', 25976: 'MTRK'}, 'Annual Volume': {26055: 5604.0, 26056: 5604.0, 26057: 5604.0, 26058: 5604.0, 26059: 5604.0, 25972: 4917.0, 25973: 4917.0, 25974: 4917.0, 25975: 4917.0, 25976: 4917.0}, 'Annual Capacity': {26055: 260.0, 26056: 1300.0, 26057: 2704.0, 26058: 2080.0, 26059: 4368.0, 25972: 5408.0, 25973: 3380.0, 25974: 4940.0, 25975: 156.0, 25976: 4940.0}})

Function

def allocation(df_alloc):
  Annual_Volume = df_alloc['Annual Volume']
  Annual_Capacity = df_alloc['Annual Capacity']
  Allocation = []
  Cum_Capacity = 0
  for idx in df_alloc.index:
    Allocate = (min(0.5*Annual_Volume[idx],Annual_Capacity[idx],Annual_Volume[idx]-Cum_Capacity))
    Cum_Capacity =Allocate
    Allocation.append(Allocate)
  return Allocation

Groupby operation:

df_SOT.groupby('Lane').apply(allocation)

>>>Ouput
Lane
L1         [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2    [260.0, 1300.0, 2704.0, 1340.0, 0.0]
dtype: object

df_SOT['Allocation] = df_SOT.groupby('Lane').apply(allocation)

But it returns as NaN. Not sure what I am doing wrong here.

	Lane	Carrier SCAC	Annual Volume	Annual Capacity	Allocation
26055	L2	JNJR	5604.0	260.0	NaN
26056	L2	WOSQ	5604.0	1300.0	NaN
26057	L2	BGME	5604.0	2704.0	NaN
26058	L2	ITSB	5604.0	2080.0	NaN
26059	L2	UCSB	5604.0	4368.0	NaN
25972	L1	BGME	4917.0	5408.0	NaN
25973	L1	SCNN	4917.0	3380.0	NaN
25974	L1	XPOL	4917.0	4940.0	NaN
25975	L1	SJRG	4917.0	156.0	NaN
25976	L1	MTRK	4917.0	4940.0	NaN

I wanted the NaN column to reflect the series values in column as came in the groupby operation:

L1         [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2    [260.0, 1300.0, 2704.0, 1340.0, 0.0]

Or, if there is a better way to go about that, please let me know.

CodePudding user response：

One alternative is to make allocation return a pandas Series with the correct index instead of a simple list, and setting group_keys to False in the groupby:

def allocation(df_alloc):
    Annual_Volume = df_alloc["Annual Volume"]
    Annual_Capacity = df_alloc["Annual Capacity"]
    Allocation = []
    Cum_Capacity = 0
    for idx in df_alloc.index:
        Allocate = min(
            0.5 * Annual_Volume[idx],
            Annual_Capacity[idx],
            Annual_Volume[idx] - Cum_Capacity,
        )
        Cum_Capacity  = Allocate
        Allocation.append(Allocate)
    return pd.Series(Allocation, index=df_alloc.index)


df_SOT["Allocation"] = df_SOT.groupby("Lane", group_keys=False).apply(allocation)