Auto re-assign ids in a dataframe-CodePudding

I have the following dataframe:

import pandas as pd
data = {'id': [542588, 542594, 542594, 542605, 542605, 542605, 542630, 542630],
 'label': [3, 3, 1, 1, 2, 0, 0, 2]}

df = pd.DataFrame(data)
df

      id   label
0   542588  3
1   542594  3
2   542594  1
3   542605  1
4   542605  2
5   542605  0
6   542630  0
7   542630  2

The id columns contains large integers (6-digits). I want a way to simplify it, starting from 10, so that 542588 becomes 10, 542594 becomes 11, etc...

Required output:

CodePudding user response：

You can try

df['id'] = df.groupby('id').ngroup().add(10)

print(df)

   id  label
0  10      3
1  11      3
2  11      1
3  12      1
4  12      2
5  12      0
6  13      0
7  13      2

CodePudding user response：

You can use factorize:

df['id'] = df['id'].factorize()[0]   10

Output:

   id  label
0  10      3
1  11      3
2  11      1
3  12      1
4  12      2
5  12      0
6  13      0
7  13      2

Note: factorize will enumerate the keys in the order that they occur in your data, while groupby().ngroup() solution will enumerate the key in the increasing order. You can mimic the increasing order with factorize by sorting the data first. Or you can replicate the data order with groupby() by passing sort=False to it.

CodePudding user response：

new_ids = dict()
new_id = 10

for old_id in df['id']:
    if old_id not in new_ids:
        new_ids[old_id] = new_id
        new_id  = 1

df['id'] = df['id'].map(new_ids)