Home > Software engineering >  How to check if a cell of a Dataframe exists as a key in a dict, and if it does, check if another ce
How to check if a cell of a Dataframe exists as a key in a dict, and if it does, check if another ce

Time:07-11

I currently have a dataframe like this

   colA   colB   colC    colD
   1      1a     Yes     Yes
   1      1b     Yes     Yes
   2      2a     Yes     Yes
   2      2b     Yes     Yes
   2      2c     Yes     Yes
   3      3a     Yes     Yes
   4      4a     Yes     Yes

Say I have a dictionary like this

filterDict = {
   "1" : ["1a","1b"]
   "2" : ["2a","2b"]
}

I want it so that if in colA a value exists as a key, check if colB in that same row exists in the value list of the dict for that key so that the resulting dataframe would look like this

   colA   colB   colC    colD
   1      1a     No      No
   1      1b     No      No
   2      2a     No      No
   2      2b     No      No
   2      2c     Yes     Yes
   3      3a     Yes     Yes
   4      4a     Yes     Yes

I am relatively new to pandas and am unsure how to do this. What I previously had was something hardcoded like this

mask = ((df["colA"] == "1") & (df["colB"].isin(["1a","1b"])) |
       ((df["colB"] == "2") & (df["colB"].isin(["2a","2b"]))

df.loc[mask,["colC","colD"]] = ["No","No"]

But I would like this to be dynamic so that it could be put in the dict format

CodePudding user response:

You could try this

for index, row in df.iterrows():
    if str(row['colA']) in filterDict:
        if row['colB'] in filterDict[str(row['colA'])]:
            df.loc[index, 'colC'] = 'No'
            df.loc[index, 'colD'] = 'No'
            continue

    df.loc[index, 'colC'] = 'Yes'
    df.loc[index, 'colD'] = 'Yes'
        

Based on your expected output, I've assumed 'No' if a match is found, otherwise 'Yes'.

CodePudding user response:

Flatten the fitersDict into list of tuples then create a multindex from it, the use Multindex.isin to test from the occurrence of multiindex in the columns colA, colB of given dataframe

i1 = df.set_index(['colA', 'colB']).index
i2 = pd.MultiIndex.from_tuples((a, b) for a, l in filterDict.items() for b in l)

mask = ~i1.isin(i2)
df['colC'], df['colD'] = mask, mask

  colA colB   colC   colD
0    1   1a  False  False
1    1   1b  False  False
2    2   2a  False  False
3    2   2b  False  False
4    2   2c   True   True
5    3   3a   True   True
6    4   4a   True   True
  • Related