Let's say input was
d = {'col1': [1,2,3,4,5,6,7,8,9,10],
'col2': [1,2,3,4,5,6,7,8,9,10],
'col3': [1,2,3,4,5,6,7,8,9,10],
'offset': [1,2,3,1,2,3,1,2,3,1]}
df = pd.DataFrame(data=d)
I want to create an additional column that looks like this:
df['output'] = [1, 4, 9, 4, 10, 18, 7, 16, 27, 10]
Basically each number in offset
is telling you the number of columns to sum over (from col1
as ref point).
Is there a vectorized way to do this without iterating through each value in offset
?
CodePudding user response:
You use np.select. To use it, create each of the column sum
(1, 2, 3 ... as needed) as the possible choices, and create a boolean masks for each value in offset column as the possible conditons.
# get all possible values from offset
lOffset = df['offset'].unique()
# get te result with np.select
df['output'] = np.select(
# create mask for each values in offset
condlist=[df['offset'].eq(i) for i in lOffset],
# crerate the sum over the number of columns per offset value
choicelist=[df.iloc[:,:i].sum(axis=1) for i in lOffset]
)
print(df)
# col1 col2 col3 offset output
# 0 1 1 1 1 1
# 1 2 2 2 2 4
# 2 3 3 3 3 9
# 3 4 4 4 1 4
# 4 5 5 5 2 10
# 5 6 6 6 3 18
# 6 7 7 7 1 7
# 7 8 8 8 2 16
# 8 9 9 9 3 27
# 9 10 10 10 1 10
Note: this assumes that your offset column is the last one
CodePudding user response:
It can be done with pd.crosstab
then we mask
all 0 to NaN and back fill, this will return 1 as all value ned to sum
df['new'] = df.filter(like = 'col').where(pd.crosstab(df.index,df.offset).mask(lambda x : x==0).bfill(1).values==1).sum(1)
Out[710]:
0 1.0
1 4.0
2 9.0
3 4.0
4 10.0
5 18.0
6 7.0
7 16.0
8 27.0
9 10.0
dtype: float64