I have a data frame with 73k rows, and here's the following sample data :
Index Customers' Name States
0 Alpha Oregon
1 Alpha Oregon
2 Bravo Utah
3 Bravo Utah
4 Charlie Alabama
5 Charlie Alabama
6 Alpha Oregon
7 Alpha Oregon
8 Bravo Utah
The data have a unique value but I am not allowed to delete or remove it because it's needed or mandatory for my research. On the other hand, I would like to change the customers' names with some specific pseudocode so the result can look like this :
Index Customers' Name States
0 z1 Oregon
1 z1 Oregon
2 z2 Utah
3 z2 Utah
4 z3 Alabama
5 z3 Alabama
6 z1 Oregon
7 z1 Oregon
8 z2 Utah
I'm still a beginner, learning Python for around 3 months. So, how can I change this in a 'bulky' way remembering that I have 73k rows like this? I assume that it must be executed using a looping ('For'). I already tried, but I can't wrap up this well. Please help me finish/solve this.
CodePudding user response:
You can use .groupby()
with .ngroup()
:
df["Customers' Name"] = "z" (
df.groupby("Customers' Name").ngroup() 1
).astype("str")
print(df)
Prints:
Customers' Name States
0 z1 Oregon
1 z1 Oregon
2 z2 Utah
3 z2 Utah
4 z3 Alabama
5 z3 Alabama
6 z1 Oregon
7 z1 Oregon
8 z2 Utah