Home > other >  Assign a Groupby operated series to dataframe
Assign a Groupby operated series to dataframe

Time:01-09

I am passing a function to a groupby operation which returns a list which transforms to series in apply and finally assign the series back to the main dataframe.

df_SOT = pd.DataFrame({'Lane': {26055: 'L2', 26056: 'L2', 26057: 'L2', 26058: 'L2', 26059: 'L2', 25972: 'L1', 25973: 'L1', 25974: 'L1', 25975: 'L1', 25976: 'L1'}, 'Carrier SCAC': {26055: 'JNJR', 26056: 'WOSQ', 26057: 'BGME', 26058: 'ITSB', 26059: 'UCSB', 25972: 'BGME', 25973: 'SCNN', 25974: 'XPOL', 25975: 'SJRG', 25976: 'MTRK'}, 'Annual Volume': {26055: 5604.0, 26056: 5604.0, 26057: 5604.0, 26058: 5604.0, 26059: 5604.0, 25972: 4917.0, 25973: 4917.0, 25974: 4917.0, 25975: 4917.0, 25976: 4917.0}, 'Annual Capacity': {26055: 260.0, 26056: 1300.0, 26057: 2704.0, 26058: 2080.0, 26059: 4368.0, 25972: 5408.0, 25973: 3380.0, 25974: 4940.0, 25975: 156.0, 25976: 4940.0}})

Function

def allocation(df_alloc):
  Annual_Volume = df_alloc['Annual Volume']
  Annual_Capacity = df_alloc['Annual Capacity']
  Allocation = []
  Cum_Capacity = 0
  for idx in df_alloc.index:
    Allocate = (min(0.5*Annual_Volume[idx],Annual_Capacity[idx],Annual_Volume[idx]-Cum_Capacity))
    Cum_Capacity =Allocate
    Allocation.append(Allocate)
  return Allocation

Groupby operation:

df_SOT.groupby('Lane').apply(allocation)

>>>Ouput
Lane
L1         [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2    [260.0, 1300.0, 2704.0, 1340.0, 0.0]
dtype: object
df_SOT['Allocation] = df_SOT.groupby('Lane').apply(allocation)

But it returns as NaN. Not sure what I am doing wrong here.

Lane Carrier SCAC Annual Volume Annual Capacity Allocation
26055 L2 JNJR 5604.0 260.0 NaN
26056 L2 WOSQ 5604.0 1300.0 NaN
26057 L2 BGME 5604.0 2704.0 NaN
26058 L2 ITSB 5604.0 2080.0 NaN
26059 L2 UCSB 5604.0 4368.0 NaN
25972 L1 BGME 4917.0 5408.0 NaN
25973 L1 SCNN 4917.0 3380.0 NaN
25974 L1 XPOL 4917.0 4940.0 NaN
25975 L1 SJRG 4917.0 156.0 NaN
25976 L1 MTRK 4917.0 4940.0 NaN

I wanted the NaN column to reflect the series values in column as came in the groupby operation:

L1         [2458.5, 2458.5, 0.0, 0.0, 0.0]
L2    [260.0, 1300.0, 2704.0, 1340.0, 0.0]

Or, if there is a better way to go about that, please let me know.

CodePudding user response:

One alternative is to make allocation return a pandas Series with the correct index instead of a simple list, and setting group_keys to False in the groupby:

def allocation(df_alloc):
    Annual_Volume = df_alloc["Annual Volume"]
    Annual_Capacity = df_alloc["Annual Capacity"]
    Allocation = []
    Cum_Capacity = 0
    for idx in df_alloc.index:
        Allocate = min(
            0.5 * Annual_Volume[idx],
            Annual_Capacity[idx],
            Annual_Volume[idx] - Cum_Capacity,
        )
        Cum_Capacity  = Allocate
        Allocation.append(Allocate)
    return pd.Series(Allocation, index=df_alloc.index)


df_SOT["Allocation"] = df_SOT.groupby("Lane", group_keys=False).apply(allocation)
  • Related