Home > Software design >  Python pandas lambda compare two columns in dataframe
Python pandas lambda compare two columns in dataframe

Time:08-29

I have two columns of currencies in a, b;

There are some cells that are empty in both columns.

I am hoping to create a third column c to identified the following logic:

if a == b then display 'same'
elif a == None then display 'a missing'
elif b == None then display 'b missing'
elif a == None and b == None then display 'all missing'
else 'diff currency'.

These are my codes below. It just return 'same' or 'diff currency', nothing in between.

Please shed some lights on my syntax or logic flaws here. Thank you so much!

import pandas as pd

# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())

b = list(('USD CAD RMB HKD nan nan USD EUR').split())


# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])

df = df.replace('nan', '')


df['c'] = df.apply(lambda x: 'Same' if x['a'] == x['b'] 
                   else ('a missing' if x['a']==None
                         else ('b missing' if x['b']==None 
                         else ('a & b missing' if x['a']==None and x['b']==None
                         else 'diff currency'))), axis=1)

CodePudding user response:

It's better if you learn how to use the vectorized functions. They are both idiomatic pandas and extremely fast. Use np.select:

a = df["a"]
b = df["b"]
df["c"] = np.select(
    [a.isna() & b.isna(), a.isna(), b.isna(), np.isclose(a, b)],
    ["all missing", "a missing", "b missing", "same"],
    "diff currency",
)

CodePudding user response:

You can use np.select for this.

import pandas as pd
import numpy as np

# list of currencies
a = list(('USD USD CAD nan JMD nan HKD CAD').split())

b = list(('USD CAD RMB HKD nan nan USD EUR').split())

# df
df = pd.DataFrame(list(zip(a, b)), columns=['a', 'b'])

# change string `nan` into actual NaN values
df = df.replace('nan', np.nan)

condlist = [df.a == df.b, df.isna().all(axis=1), df.a.isna(), df.b.isna()]
choicelist = ['same', 'all missing', 'a missing', 'b missing']

df['c'] = np.select(condlist,choicelist,default='diff currency')
print(df)

     a    b              c
0  USD  USD           same
1  USD  CAD  diff currency
2  CAD  RMB  diff currency
3  NaN  HKD      a missing
4  JMD  NaN      b missing
5  NaN  NaN    all missing
6  HKD  USD  diff currency
7  CAD  EUR  diff currency
  • Related