I have got a base data frame where I want to select only a given set of columns and then perform calculations on these. The number of columns I want to slice differ for each row, depending on a vector/array.
Base Data
basedata = {
'1': [1, 2, 3, 4, 5],
'2': [5, 4, 3, 2, 1],
'3': [1, 1, 1, 2, 3],
'4': [3, 0, 3, 3, 3],
'5': [0, 4, 2, 2, 3],
}
df_base = pd.DataFrame(basedata)
Target Vector
targetvector = {
'RowNumber': [1, 2, 3, 4, 5],
'Target Columns': [0, 0, 1, 2, 3]
}
df_target = pd.DataFrame(targetvector)
What I want to achieve
In row 1 and 2, take the average of "zero" values
In row 3, take the average of the 1st value
..
In row 5, take the average of the first 3 values
CodePudding user response:
It may be easier to do it using numpy arrays instead of dataframes:
import numpy as np
a = np.array([[1, 2, 3, 4, 5],
[5, 4, 3, 2, 1],
[1, 1, 1, 2, 3],
[3, 0, 3, 3, 3],
[0, 4, 2, 2, 3]])
b = np.array([0, 0, 1, 2, 3])
mask = np.r_[0:a.shape[1]] >= b[:, None]
out = np.ma.array(a, mask=mask).mean(1).filled(0)
print(out)
It gives:
[0. 0. 1. 1.5 2. ]