Home > Back-end >  Pandas dataframe replace rows based on values in the list and resulting value is based on another da
Pandas dataframe replace rows based on values in the list and resulting value is based on another da

Time:03-10

I have a pandas dataframe df:

    df = 
          category        str_column
           cat1            str1
           cat2            str2
            ...             ...

and a

   parent_df =
           category        parent_category
           cat1             parent_of_cat1
           cat2             parent_of_cat2

and a list

  faulty_categories =
     ['cat1', ...]

I need to get all the categories in df which are present in faulty_categories and replace them by their parent:

So, expected output:

       df = 

         category       str_column
         parent_of_cat1   str1
         cat2             str2

How to achieve this?

I tried:

       df.loc[df['category'].isin(faulty_categories), 'category'] = parents_df.loc[parents_df['category'].isin(faulty_categories), 'parent']

But, this replaced category by NaN.

CodePudding user response:

You could use loc and map:

df.loc[df['category'].isin(faulty_categories), 'category'] = df['category'].map(parent_df.set_index('category')['parent_category'])

Or mask and map:

df['category'] = (df['category'].mask(df['category'].isin(faulty_categories),
                                      df['category'].map(parent_df.set_index('category')['parent_category'])
                                     )

Output:

         category str_column
0  parent_of_cat1       str1
1            cat2       str2

CodePudding user response:

Use df.merge and numpy.where:

In [2226]: import numpy as np

In [2227]: x = df.merge(parent_df)
In [2230]: x['category'] = np.where(x.category.isin(faulty_categories), x.parent_category, x.category)

In [2232]: x.drop('parent_category', 1, inplace=True)

In [2233]: x
Out[2233]: 
         category str_column
0  parent_of_cat1       str1
1            cat2       str2

CodePudding user response:

df

    category    str_column
0   cat1         str1
1   cat2         str2
2   cat3         str3

    category    parent_category
0   cat1        parent_of_cat1
1   cat3        parent_of_cat3
2   cat2        parent_of_cat2

faulty_cat = ['cat1', 'cat3']

Solution:-

for i in range(len(df)):
    if df.loc[i,'category'] in faulty_cat:
        df.loc[i,'category'] = parent_df['parent_category'][parent_df['category'] == df.loc[i,'category']].item()

Output df:-

      category          str_column
0   parent_of_cat1      str1
1   cat2                str2
2   parent_of_cat3      str3
  • Related