I want to do the same operation on multiple sets of columns of a DataFrame.
Since "for-loops" are frowned upon I'm searching for a decent alternative.
An example:
df = pd.DataFrame({
'a': [1, 11, 111],
'b': [222, 22, 2],
'a_x': [10, 80, 30],
'b_x': [20, 20, 60],
})
This is a simple for-loop approach. It's short and well readable.
cols = ['a', 'b']
for col in cols:
df[f'{col}_res'] = df[[col, f'{col}_x']].min(axis=1)
a b a_x b_x a_res b_res
0 1 222 10 20 1 20
1 11 22 80 20 11 20
2 111 2 30 60 30 2
This is an alternative (w/o for-loop), but I feel that the additional complexity is not really for the better.
cols = ['a', 'b']
def res_df(df, col, name):
res = pd.Series(
df[[col, f'{col}_x']].min(axis=1), index=df.index, name=name)
return res
res = [res_df(df, col, f'{col}_res') for col in cols]
df = pd.concat([df, pd.concat(res, axis=1)], axis=1)
Does anyone have a better/more pythonic solution?
Thanks!
UPDATE 1
Inspired by the proposal from mozway I find the following solution quite appealing.
Imho it's short, readable and generic, since the particular operation can be swapped into a function and the list comprehension applies the function to the given sets of columns.
def operation(s1, s2):
# fill in any operation on pandas series'
# e.g. res = s1 * s2 / (s1 s2)
res = np.minimum(s1, s2)
return res
df = df.join(
[operation(df[f'{col}'], df[f'{col}_x']).rename(f'{col}_res') for col in cols]
)
CodePudding user response:
You can use numpy.minimum
after setting the arrays to identical column names:
cols = ['a', 'b']
cols2 = [f'{x}_x' for x in cols]
df = df.join(np.minimum(df[cols],
df[cols2].set_axis(cols, axis=1))
.add_suffix('_res'))
output:
a b a_x b_x a_res b_res
0 1 222 10 20 1 20
1 11 22 80 20 11 20
2 111 2 30 60 30 2
or, using rename
as suggested in the other answer:
cols = ['a', 'b']
cols2 = {f'{x}_x': x for x in cols}
df = df.join(np.minimum(df[cols],
df[list(cols2)].rename(columns=cols2))
.add_suffix('_res'))
CodePudding user response:
One idea is rename columns names by dictionary, select columns by list cols
and then group by columns names with aggregate min
, sum
, max
or use custom function:
cols = ['a', 'b']
suffix = '_x'
d = {f'{x}{suffix}':x for x in cols}
print (d)
{'a_x': 'a', 'b_x': 'b'}
print (df.rename(columns=d)[cols])
a a b b
0 1 10 222 20
1 11 80 22 20
2 111 30 2 60
df1 = df.rename(columns=d)[cols].groupby(axis=1,level=0).min().add_suffix('_res')
print (df1)
a_res b_res
0 1 20
1 11 20
2 30 2
Last add to original DataFrame:
df = df.join(df1)
print (df)
a b a_x b_x a_res b_res
0 1 222 10 20 1 20
1 11 22 80 20 11 20
2 111 2 30 60 30 2