Home > Software engineering >  How to write 'I' if two columns are having similar values in pandas Dataframe?
How to write 'I' if two columns are having similar values in pandas Dataframe?

Time:03-14

Dataframe is like this:

            RS                 AS                    IS
F1  [F1, F2, F3, F4, F5]      [F1]                  [F1]
F2  [F2, F3, F5]          [F1, F2, F3, F5]      [F5, F3, F2]
F3  [F2, F3, F4, F5]      [F1, F2, F3, F5]      [F5, F3, F2]
F4  [F4]                  [F1, F3, F4, F5]          [F4]
F5  [F2, F3, F4, F5]      [F1, F2, F3, F5]      [F5, F3, F2]

Output I need:

            RS                 AS                    IS          Level
F1  [F1, F2, F3, F4, F5]      [F1]                  [F1]     
F2  [F2, F3, F5]          [F1, F2, F3, F5]      [F5, F3, F2]       I
F3  [F2, F3, F4, F5]      [F1, F2, F3, F5]      [F5, F3, F2]       
F4  [F4]                  [F1, F3, F4, F5]          [F4]           I
F5  [F2, F3, F4, F5]      [F1, F2, F3, F5]      [F5, F3, F2]       

The logic is very simple. If RS and IS is having similar values then write I in Level column. I am using the following code but looks like it doesn't work.

if df['RS'].any() == df['IS'].any():
df['Level'] = 'I'

Also need to drop the entire row having level 'I' from original Dataframe after above method is implemented. like this

            RS                 AS                    IS
F1  [F1, F2, F3, F4, F5]      [F1]                  [F1]
F3  [F2, F3, F4, F5]      [F1, F2, F3, F5]      [F5, F3, F2]
F5  [F2, F3, F4, F5]      [F1, F2, F3, F5]      [F5, F3, F2]

CodePudding user response:

Convert your lists to set and then comparing for equality to get which rows have the same elements, then assign the value. The example below ignores your middle column.

import pandas as pd

df = pd.DataFrame({'RS':
    [[1,2,3,4,5],
     [2,3,5],
     [2,3,4,5],
     [4],
     [2,3,4,5]],
    'IS':
    [[1],
     [5,3,2],
     [5,3,2],
     [4],
     [5,3,2]]})

ix = df.RS.apply(set) == df.IS.apply(set)
df['Level'] = ''
df.loc[ix, 'Level'] = 'I'

df:
# returns:
             RS         IS Level
[1, 2, 3, 4, 5]        [1]
      [2, 3, 5]  [5, 3, 2]     I
   [2, 3, 4, 5]  [5, 3, 2]
            [4]        [4]     I
   [2, 3, 4, 5]  [5, 3, 2]

If you need to drop the rows where I would be assigned; you don't actually need to assign I, just use:

ix = df.RS.apply(set) == df.IS.apply(set)
df.loc[~ix]
  • Related