Home > Enterprise >  Change certain categorical variables to a unified entry
Change certain categorical variables to a unified entry

Time:12-18

Let's say I have have a dataframe with a column called animals. The entries look as followed:

'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'E', 'F', 'G', 'H', 'I'.

I want to change the entries 'E', 'F', 'G', 'H' and 'I' to another unified entry called 'D'. What is the best way to transform all these categorical entries into one category?

CodePudding user response:

You can create a list of the entries you want to change, and then you can assign 'D' for them using loc to spot them, and isin to evalute if your condition is satisfied:

li = ['E','F','G','H','I']
df.loc[df.animals.isin(li), 'animals'] = 'D'

An alternative to loc, would be numpy's where:

df['animals'] = np.where(df['animals'].isin(li),'D',df['animals'])

Which reads: for every row in the animals column, check if the value is in the the list called li and if it is return 'D', otherwise keep the column intact

  • Related