Home > Blockchain >  Can I slice a dataframe based on integers in a vector?
Can I slice a dataframe based on integers in a vector?

Time:09-04

I have got a base data frame where I want to select only a given set of columns and then perform calculations on these. The number of columns I want to slice differ for each row, depending on a vector/array.

Base Data

basedata = {
    '1': [1, 2, 3, 4, 5],
    '2': [5, 4, 3, 2, 1],
    '3': [1, 1, 1, 2, 3],
    '4': [3, 0, 3, 3, 3],
    '5': [0, 4, 2, 2, 3],
        }
  
df_base = pd.DataFrame(basedata)

Target Vector

targetvector = {
    'RowNumber': [1, 2, 3, 4, 5],
    'Target Columns': [0, 0, 1, 2, 3]
        }
  
df_target = pd.DataFrame(targetvector)

What I want to achieve

In row 1 and 2, take the average of "zero" values
In row 3, take the average of the 1st value
..
In row 5, take the average of the first 3 values

CodePudding user response:

It may be easier to do it using numpy arrays instead of dataframes:

import numpy as np

a = np.array([[1, 2, 3, 4, 5], 
              [5, 4, 3, 2, 1], 
              [1, 1, 1, 2, 3],
              [3, 0, 3, 3, 3], 
              [0, 4, 2, 2, 3]])

b = np.array([0, 0, 1, 2, 3])


mask = np.r_[0:a.shape[1]] >= b[:, None]
out = np.ma.array(a, mask=mask).mean(1).filled(0)
print(out)

It gives:

[0.  0.  1.  1.5 2. ]
  • Related