Home > Software engineering >  Python Pandas do an operation over a set of different columns
Python Pandas do an operation over a set of different columns

Time:09-30

I have a dataframe with multiple text columns and some values and i'm sure there is a better way to do this with built in functions but i haven't got the answer yet.

ID   A      B    C    q    value
1    Day    N    Wed  20    30
2    Noon   N    Wed  12    50
3    Day    S    Mon  30    13
4    Night  W    Sun  5      6
5    Night  E    Sun  20    20
6    Day    S    Wed  40    20

I want to make an operation over q and value over the different [A,B,C] keys, like q * value if value < 30, else q * value / 2

for uniqueA in df['A'].unique():
    for uniqueB in df.query('A == @uniqueA')['B'].unique():
        for uniqueC in df.query('A== @uniqueA and B == @uniqueB')['C'].unique():
            aux = df.query('A== @uniqueA and B == @uniqueB and C == @uniqueC')[['A','B','C','q',value]]
            aux = df.groupby(['A','B','C','q'])['value'].sum().reset_index()
                if value < 30:
                   result = df.query('A== @uniqueA and B == @uniqueB and C == @uniqueC')['q'] * value
                else:
                   result = df.query('A== @uniqueA and B == @uniqueB and C == @uniqueC')['q'] * (value/2)
                # Save on results another df

CodePudding user response:

res = df.apply(lambda r: r['q'] * r['value'] if r['value'] < 30 else r['q'] * r['value'] /2, axis=1)

CodePudding user response:

There are many ways to do so.

One way is using df.loc method:

condition = df['q'] < 30 # condition

df.loc[condition, "ans"]  = df['q'] * df['value']     # when true
df.loc[~condition, "ans"] = df['q'] * df['value'] / 2 # when false

Another way is using Numpy's where method:

import numpy as np
df['ans'] = np.where(df['q']<30, df['q'] * df['value'], df['q'] * df['value'] / 2)

The syntax so you understand what's happening is

np.where(condition, operation when true, operation when false)

CodePudding user response:

For a vectorised operation you can use numpy.where, which returns an array:

import numpy as np

np.where(df['value'].lt(30),
         df['q'].mul(df['value']), 
         df['q'].mul(df['value']).div(2))

Out[162]: array([300., 300., 390.,  30., 400., 800.])

To assign your result back to a new column in your existing dataframe:

df['result'] = np.where(df['value'].lt(30),
         df['q'].mul(df['value']), 
         df['q'].mul(df['value']).div(2))
>>> df

   ID      A  B    C   q  value  result
0   1    Day  N  Wed  20     30   300.0
1   2   Noon  N  Wed  12     50   300.0
2   3    Day  S  Mon  30     13   390.0
3   4  Night  W  Sun   5      6    30.0
4   5  Night  E  Sun  20     20   400.0
5   6    Day  S  Wed  40     20   800.0
  • Related