Home > Mobile >  Create calculated field within dataset using Python
Create calculated field within dataset using Python

Time:10-14

I have a dataset, df, where I would like to create columns that display the output of a subtraction calculation:

Data

count   power   id  p_q122  p_q222      c_q122  c_q222  
100     1000    aa  200     300         10      20      
100     2000    bb  400     500         5       10      

Desired

cnt pwr    id   p_q122  avail1  p_q222  avail2  c_q122  count1  c_q222  count2  
100 1000   aa   200     800     300     700     10      90      20      80
100 2000   bb   400     1600    500     1500    5       95      10      90

Doing

   a =    df['avail1']  =   df['pwr'] - df['p_q122']
   
   b =    df['avail2']  =   df['pwr'] - df['p_q222']

I am looking for a more elegant way that provides the desire output. Any suggestion is appreciated.

CodePudding user response:

Try:

df['avail1'] = df['power'].sub(df['p_q122'])
df['avail2'] = df['power'].sub(df['p_q222'])

CodePudding user response:

We can perform 2D subtraction with numpy:

pd.DataFrame(
    df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
).rename(columns=lambda i: f'avail{i   1}')

   avail1  avail2
0     800     700
1    1600    1500

Benefit here is that, no matter how many p_ columns there are, all will be subtracted from the power column.


We can concat all of the computations with df like:

df = pd.concat([
    df,
    # power calculations
    pd.DataFrame(
        df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
    ).rename(columns=lambda i: f'avail{i   1}'),
    # Count calculations
    pd.DataFrame(
        df['count'].to_numpy()[:, None] - df.filter(like='c_').to_numpy()
    ).rename(columns=lambda i: f'count{i   1}'),
], axis=1)

which gives df:

   count  power  id  p_q122  p_q222  ...  c_q222  avail1  avail2  count1  count2
0    100   1000  aa     200     300  ...      20     800     700      90      80
1    100   2000  bb     400     500  ...      10    1600    1500      95      90

[2 rows x 11 columns]

If we have many column groups to do, we can build the list of DataFrames programmatically as well:

df = pd.concat([df, *(
    pd.DataFrame(
        df[col].to_numpy()[:, None] - df.filter(like=filter_prefix).to_numpy()
    ).rename(columns=lambda i: f'{new_prefix}{i   1}')
    for col, filter_prefix, new_prefix in [
        ('power', 'p_', 'avail'),
        ('count', 'c_', 'count')
    ]
)], axis=1)

Setup and imports:

import pandas as pd

df = pd.DataFrame({
    'count': [100, 100], 'power': [1000, 2000], 'id': ['aa', 'bb'],
    'p_q122': [200, 400], 'p_q222': [300, 500], 'c_q122': [10, 5],
    'c_q222': [20, 10]
})
  • Related