Pandas DataFrame: Same operation on multiple sets of columns-CodePudding

I want to do the same operation on multiple sets of columns of a DataFrame.

Since "for-loops" are frowned upon I'm searching for a decent alternative.

An example:

df = pd.DataFrame({
    'a': [1, 11, 111],
    'b': [222, 22, 2],
    'a_x': [10, 80, 30],
    'b_x': [20, 20, 60],
})

This is a simple for-loop approach. It's short and well readable.

cols = ['a', 'b']
for col in cols:
    df[f'{col}_res'] = df[[col, f'{col}_x']].min(axis=1)

    a   b   a_x b_x a_res b_res
0   1   222 10  20  1     20
1   11  22  80  20  11    20
2   111 2   30  60  30    2

This is an alternative (w/o for-loop), but I feel that the additional complexity is not really for the better.

cols = ['a', 'b']
def res_df(df, col, name):
    res = pd.Series(
        df[[col, f'{col}_x']].min(axis=1), index=df.index, name=name)
    return res

res = [res_df(df, col, f'{col}_res') for col in cols]
df = pd.concat([df, pd.concat(res, axis=1)], axis=1)

Does anyone have a better/more pythonic solution?
Thanks!

UPDATE 1
Inspired by the proposal from mozway I find the following solution quite appealing.
Imho it's short, readable and generic, since the particular operation can be swapped into a function and the list comprehension applies the function to the given sets of columns.

def operation(s1, s2):
    # fill in any operation on pandas series'
    # e.g. res = s1 * s2 / (s1   s2)
    res = np.minimum(s1, s2)
    return res

df = df.join(
    [operation(df[f'{col}'], df[f'{col}_x']).rename(f'{col}_res') for col in cols]
)

CodePudding user response：

You can use numpy.minimum after setting the arrays to identical column names:

cols = ['a', 'b']
cols2 = [f'{x}_x' for x in cols]
df = df.join(np.minimum(df[cols],
                        df[cols2].set_axis(cols, axis=1))
               .add_suffix('_res'))

output:

     a    b  a_x  b_x  a_res  b_res
0    1  222   10   20      1     20
1   11   22   80   20     11     20
2  111    2   30   60     30      2

or, using rename as suggested in the other answer:

cols = ['a', 'b']
cols2 = {f'{x}_x': x for x in cols}
df = df.join(np.minimum(df[cols],
                        df[list(cols2)].rename(columns=cols2))
               .add_suffix('_res'))

CodePudding user response：

One idea is rename columns names by dictionary, select columns by list cols and then group by columns names with aggregate min, sum, max or use custom function:

cols = ['a', 'b']
suffix = '_x'

d = {f'{x}{suffix}':x for x in cols}
print (d)
{'a_x': 'a', 'b_x': 'b'}

print (df.rename(columns=d)[cols])
     a   a    b   b
0    1  10  222  20
1   11  80   22  20
2  111  30    2  60

df1 = df.rename(columns=d)[cols].groupby(axis=1,level=0).min().add_suffix('_res')
print (df1)
   a_res  b_res
0      1     20
1     11     20
2     30      2

Last add to original DataFrame:

df = df.join(df1)
print (df)
     a    b  a_x  b_x  a_res  b_res
0    1  222   10   20      1     20
1   11   22   80   20     11     20
2  111    2   30   60     30      2