Home > Enterprise >  How to Replace Multiple String in a Data frame Using Python
How to Replace Multiple String in a Data frame Using Python

Time:04-20

I have a data frame with 73k rows, and here's the following sample data :

Index    Customers' Name   States
0        Alpha             Oregon
1        Alpha             Oregon
2        Bravo             Utah
3        Bravo             Utah
4        Charlie           Alabama
5        Charlie           Alabama
6        Alpha             Oregon
7        Alpha             Oregon
8        Bravo             Utah

The data have a unique value but I am not allowed to delete or remove it because it's needed or mandatory for my research. On the other hand, I would like to change the customers' names with some specific pseudocode so the result can look like this :

Index    Customers' Name   States
0        z1                Oregon
1        z1                Oregon
2        z2                Utah
3        z2                Utah
4        z3                Alabama
5        z3                Alabama
6        z1                Oregon
7        z1                Oregon
8        z2                Utah 

I'm still a beginner, learning Python for around 3 months. So, how can I change this in a 'bulky' way remembering that I have 73k rows like this? I assume that it must be executed using a looping ('For'). I already tried, but I can't wrap up this well. Please help me finish/solve this.

CodePudding user response:

You can use .groupby() with .ngroup():

df["Customers' Name"] = "z"   (
    df.groupby("Customers' Name").ngroup()   1
).astype("str")

print(df)

Prints:

  Customers' Name   States
0              z1   Oregon
1              z1   Oregon
2              z2     Utah
3              z2     Utah
4              z3  Alabama
5              z3  Alabama
6              z1   Oregon
7              z1   Oregon
8              z2     Utah
  • Related