Label encode subgroups after groupby-CodePudding

I want to label encode subgroups in a pandas dataframe. Something like this:

| Category   | | Name      |
| ---------- | | --------- | 
| FRUITS     | | Apple     |
| FRUITS     | | Orange    |
| FRUITS     | | Apple     |
| Vegetables | | Onion     |
| Vegetables | | Garlic    |
| Vegetables | | Garlic    |

| Category   | | Name    | | Label |
| ---------- | | ------- | | ----- |
| FRUITS     | | Apple   | | 1     |
| FRUITS     | | Orange  | | 2     |
| FRUITS     | | Apple   | | 1     |
| Vegetables | | Onion   | | 1     |
| Vegetables | | Garlic  | | 2     |
| Vegetables | | Garlic  | | 2     |

CodePudding user response：

Try to group-by "Category" and then group-by "Name" and use .ngroup():

df["Label"] = (
    df.groupby("Category")
    .apply(lambda x: x.groupby("Name", sort=False).ngroup()   1)
    .values
)
print(df)

Prints:

     Category    Name  Label
0      FRUITS   Apple      1
1      FRUITS  Orange      2
2      FRUITS   Apple      1
3  Vegetables   Onion      1
4  Vegetables  Garlic      2
5  Vegetables  Garlic      2

CodePudding user response：

You can use factorize per group:

df['Label'] = (df.groupby('Category')['Name']
               .transform(lambda x: pd.factorize(x)[0])
               .add(1)
               )

Output:

     Category    Name  Label
0      FRUITS   Apple      1
1      FRUITS  Orange      2
2      FRUITS   Apple      1
3  Vegetables   Onion      1
4  Vegetables  Garlic      2
5  Vegetables  Garlic      2