Home > Mobile >  replacing all row values with shorter names in pandas
replacing all row values with shorter names in pandas

Time:10-10

I have a big data set with tons of rows. I have one column in that data set with long row values. I want to rename these row values with shorter names in pandas automatically. What should I do?

My data is something like this:

enter image description here

and I want an output like this:

enter image description here

CodePudding user response:

What you are looking for is the pd.factorize function which encodes the different patterns of objects as an enumerated type (with different serial numbers). You can use it as follows:

df['Col1'] = 'C'   pd.Series(pd.factorize(df['Col1'])[0]   1, dtype='string')

Demo

Data Input

data = {'Col1': ['XXXXXXXXXXXXXX', 'YYYYYYYYYYYYYY', 'XXXXXXXXXXXXXX', 'YYYYYYYYYYYYYY', 'XXXXXXXXXXXXXX', 'ZZZZZZZZZZZZZZ']}
df = pd.DataFrame(data)

print(df) 


             Col1
0  XXXXXXXXXXXXXX
1  YYYYYYYYYYYYYY
2  XXXXXXXXXXXXXX
3  YYYYYYYYYYYYYY
4  XXXXXXXXXXXXXX
5  ZZZZZZZZZZZZZZ

Output:

print(df)

  Col1
0   C1
1   C2
2   C1
3   C2
4   C1
5   C3

CodePudding user response:

Use:

df['col1'] = 'C'   (df.groupby('Col1').ngroup()   1).astype(str)
  • Related