Home > Net >  Apply function only on specific rows AND columns using Python Pandas
Apply function only on specific rows AND columns using Python Pandas

Time:09-24

I have a dataframe below:

df = {'a': [1, 2, 3],
      'b': [77, 88, 99],
      'c1': [1, 1, 1],
      'c2': [2, 2, 2],
      'c3': [3, 3, 3]}
df = pd.DataFrame(df)

and a function:

def test_function(row):
    return row['b']

How can I apply this function on the 'c' columns (i.e. c1, c2 and c3), BUT only for specific rows whose 'a' value matches the 2nd character of the 'c' columns?

For example, for the first row, the value of 'a' is 1, so for the first row, I would like to apply this function on column 'c1'.

For the second row, the value of 'a' is 2, so for the second row, I would like to apply this function on column 'c2'. And so forth for the rest of the rows.

The desired end result should be:

df_final = {'a': [1, 2, 3],
            'b': [77, 88, 99],
            'c1': [77, 1, 1],
            'c2': [2, 88, 2],
            'c3': [3, 3, 99]}
df_final = pd.DataFrame(df_final)

CodePudding user response:

Use Series.mask with compare c columns filtered by DataFrame.filter and if match repalce by values of b:

c_cols = df.filter(like='c').columns

def test_function(row):
    #for test integers from 0 to 9
    #m = c_cols.str[1].astype(int) == row['a']
    #for test integers from 0 to 100
    m = c_cols.str.extract('(\d )', expand=False).astype(int) == row['a']
    row[c_cols] = row[c_cols].mask(m, row['b'])
    return row

df = df.apply(test_function, axis=1)
print (df)
   a   b  c1  c2  c3
0  1  77  77   2   3
1  2  88   1  88   3
2  3  99   1   2  99

Non loop faster alternative with broadcasting:

arr = c_cols.str.extract('(\d )', expand=False).astype(int).to_numpy()[:, None]
m = df['a'].to_numpy() == arr
df[c_cols] = df[c_cols].mask(m, df['b'], axis=0)
  • Related