I have a dataset, df, where I would like to create columns that display the output of a subtraction calculation:
Data
count power id p_q122 p_q222 c_q122 c_q222
100 1000 aa 200 300 10 20
100 2000 bb 400 500 5 10
Desired
cnt pwr id p_q122 avail1 p_q222 avail2 c_q122 count1 c_q222 count2
100 1000 aa 200 800 300 700 10 90 20 80
100 2000 bb 400 1600 500 1500 5 95 10 90
Doing
a = df['avail1'] = df['pwr'] - df['p_q122']
b = df['avail2'] = df['pwr'] - df['p_q222']
I am looking for a more elegant way that provides the desire output. Any suggestion is appreciated.
CodePudding user response:
Try:
df['avail1'] = df['power'].sub(df['p_q122'])
df['avail2'] = df['power'].sub(df['p_q222'])
CodePudding user response:
We can perform 2D subtraction with numpy:
pd.DataFrame(
df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
).rename(columns=lambda i: f'avail{i 1}')
avail1 avail2
0 800 700
1 1600 1500
Benefit here is that, no matter how many p_
columns there are, all will be subtracted from the power
column.
We can concat
all of the computations with df like:
df = pd.concat([
df,
# power calculations
pd.DataFrame(
df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
).rename(columns=lambda i: f'avail{i 1}'),
# Count calculations
pd.DataFrame(
df['count'].to_numpy()[:, None] - df.filter(like='c_').to_numpy()
).rename(columns=lambda i: f'count{i 1}'),
], axis=1)
which gives df
:
count power id p_q122 p_q222 ... c_q222 avail1 avail2 count1 count2
0 100 1000 aa 200 300 ... 20 800 700 90 80
1 100 2000 bb 400 500 ... 10 1600 1500 95 90
[2 rows x 11 columns]
If we have many column groups to do, we can build the list of DataFrames programmatically as well:
df = pd.concat([df, *(
pd.DataFrame(
df[col].to_numpy()[:, None] - df.filter(like=filter_prefix).to_numpy()
).rename(columns=lambda i: f'{new_prefix}{i 1}')
for col, filter_prefix, new_prefix in [
('power', 'p_', 'avail'),
('count', 'c_', 'count')
]
)], axis=1)
Setup and imports:
import pandas as pd
df = pd.DataFrame({
'count': [100, 100], 'power': [1000, 2000], 'id': ['aa', 'bb'],
'p_q122': [200, 400], 'p_q222': [300, 500], 'c_q122': [10, 5],
'c_q222': [20, 10]
})