I have a big data set with tons of rows. I have one column in that data set with long row values. I want to rename these row values with shorter names in pandas automatically. What should I do?
My data is something like this:
and I want an output like this:
CodePudding user response:
What you are looking for is the pd.factorize
function which encodes the different patterns of objects as an enumerated type (with different serial numbers). You can use it as follows:
df['Col1'] = 'C' pd.Series(pd.factorize(df['Col1'])[0] 1, dtype='string')
Demo
Data Input
data = {'Col1': ['XXXXXXXXXXXXXX', 'YYYYYYYYYYYYYY', 'XXXXXXXXXXXXXX', 'YYYYYYYYYYYYYY', 'XXXXXXXXXXXXXX', 'ZZZZZZZZZZZZZZ']}
df = pd.DataFrame(data)
print(df)
Col1
0 XXXXXXXXXXXXXX
1 YYYYYYYYYYYYYY
2 XXXXXXXXXXXXXX
3 YYYYYYYYYYYYYY
4 XXXXXXXXXXXXXX
5 ZZZZZZZZZZZZZZ
Output:
print(df)
Col1
0 C1
1 C2
2 C1
3 C2
4 C1
5 C3
CodePudding user response:
Use:
df['col1'] = 'C' (df.groupby('Col1').ngroup() 1).astype(str)