I have a pandas dataframe df:
df =
category str_column
cat1 str1
cat2 str2
... ...
and a
parent_df =
category parent_category
cat1 parent_of_cat1
cat2 parent_of_cat2
and a list
faulty_categories =
['cat1', ...]
I need to get all the categories in df
which are present in faulty_categories
and replace them by their parent:
So, expected output:
df =
category str_column
parent_of_cat1 str1
cat2 str2
How to achieve this?
I tried:
df.loc[df['category'].isin(faulty_categories), 'category'] = parents_df.loc[parents_df['category'].isin(faulty_categories), 'parent']
But, this replaced category by NaN.
CodePudding user response:
You could use loc
and map
:
df.loc[df['category'].isin(faulty_categories), 'category'] = df['category'].map(parent_df.set_index('category')['parent_category'])
Or mask
and map
:
df['category'] = (df['category'].mask(df['category'].isin(faulty_categories),
df['category'].map(parent_df.set_index('category')['parent_category'])
)
Output:
category str_column
0 parent_of_cat1 str1
1 cat2 str2
CodePudding user response:
Use df.merge
and numpy.where
:
In [2226]: import numpy as np
In [2227]: x = df.merge(parent_df)
In [2230]: x['category'] = np.where(x.category.isin(faulty_categories), x.parent_category, x.category)
In [2232]: x.drop('parent_category', 1, inplace=True)
In [2233]: x
Out[2233]:
category str_column
0 parent_of_cat1 str1
1 cat2 str2
CodePudding user response:
df
category str_column
0 cat1 str1
1 cat2 str2
2 cat3 str3
category parent_category
0 cat1 parent_of_cat1
1 cat3 parent_of_cat3
2 cat2 parent_of_cat2
faulty_cat = ['cat1', 'cat3']
Solution:-
for i in range(len(df)):
if df.loc[i,'category'] in faulty_cat:
df.loc[i,'category'] = parent_df['parent_category'][parent_df['category'] == df.loc[i,'category']].item()
Output df:-
category str_column
0 parent_of_cat1 str1
1 cat2 str2
2 parent_of_cat3 str3