So i have a dataset with a few columns. However, one column is made out of locations, in this case street names. I would like to rename those to just one letter. So for example:
Location | Location new |
---|---|
Beukenlaan | A |
Deventerlaan | B |
Crixstraat | C |
Deventerlaan | B |
I have to do this for multiple datasheets so the function below would take a lot of time to adjust manually.
df.replace('Eindhoven Genovevalaan', 'A', regex=True, inplace=True)
Any ideas how to do this automatically?
CodePudding user response:
You could use factorize
and string.ascii_uppercase
:
from string import ascii_uppercase
import numpy as np
d = dict(enumerate(ascii_uppercase))
df['Location new'] = pd.Series(df['Location'].factorize()[0]).map(d)
NB. there are only 26 values in ascii_uppercase
('ABCDEFGHIJKLMNOPQRSTUVWXYZ'
) so you need to specify what should happen if you have more than 26 rows.
output:
Location Location new
0 Beukenlaan A
1 Deventerlaan B
2 Crixstraat C
3 Deventerlaan B