Home > Back-end >  creating pandas function equivalent for EXCEL OFFSET function
creating pandas function equivalent for EXCEL OFFSET function

Time:03-10

Let's say input was

d = {'col1': [1,2,3,4,5,6,7,8,9,10], 
 'col2': [1,2,3,4,5,6,7,8,9,10],
 'col3': [1,2,3,4,5,6,7,8,9,10],
 'offset': [1,2,3,1,2,3,1,2,3,1]} 

df = pd.DataFrame(data=d)

I want to create an additional column that looks like this:

df['output'] = [1, 4, 9, 4, 10, 18, 7, 16, 27, 10]

Basically each number in offset is telling you the number of columns to sum over (from col1 as ref point).

Is there a vectorized way to do this without iterating through each value in offset?

CodePudding user response:

You use np.select. To use it, create each of the column sum (1, 2, 3 ... as needed) as the possible choices, and create a boolean masks for each value in offset column as the possible conditons.

# get all possible values from offset
lOffset = df['offset'].unique()

# get te result with np.select
df['output'] = np.select(
    # create mask for each values in offset
    condlist=[df['offset'].eq(i) for i in lOffset],
    # crerate the sum over the number of columns per offset value
    choicelist=[df.iloc[:,:i].sum(axis=1) for i in lOffset]
)
print(df)
#    col1  col2  col3  offset  output
# 0     1     1     1       1       1
# 1     2     2     2       2       4
# 2     3     3     3       3       9
# 3     4     4     4       1       4
# 4     5     5     5       2      10
# 5     6     6     6       3      18
# 6     7     7     7       1       7
# 7     8     8     8       2      16
# 8     9     9     9       3      27
# 9    10    10    10       1      10

Note: this assumes that your offset column is the last one

CodePudding user response:

It can be done with pd.crosstab then we mask all 0 to NaN and back fill, this will return 1 as all value ned to sum

df['new'] = df.filter(like = 'col').where(pd.crosstab(df.index,df.offset).mask(lambda x : x==0).bfill(1).values==1).sum(1)
Out[710]: 
0     1.0
1     4.0
2     9.0
3     4.0
4    10.0
5    18.0
6     7.0
7    16.0
8    27.0
9    10.0
dtype: float64
  • Related