Replace column values in a large dataframe-CodePudding

I have a dataframe that has similar ids with spatiotemporal data like below:

car_id    lat long
 xxx      32  150
 xxx      33  160
 yyy      20  140
 yyy      22  140
 zzz      33   70
 zzz      33   80
  .        .    .

I want to replace car_id with car_1, car_2, car_3, ... However, my dataframe is large and it's not possible to do it manually by name so first I made a list of all unique values in the car_id column and made a list of names that should be replaced with:

u_values = [i for i in df['car_id'].unique()]
r = ['car' str(i) for i in range(len(u_values))]

Now I'm not sure how to replace all unique numbers in car_id column with list values so the result is like this:

car_id    lat long
 car_1     32  150
 car_1     33  160
 car_2     20  140
 car_2     22  140
 car_3     33   70
 car_3     33   80
     .       .   .

CodePudding user response：

Create a mapping from u_values to r and map it to car_id column.

mapping = pd.Series(r, index=u_values)
df['car_id'] = df['car_id'].map(mapping)

That said, it seems vectorized string concatenation is enough for this task. factorize() method encodes the strings.

df['car_id'] = 'car_'   pd.Series(df['car_id'].factorize()[0], dtype='string')

CodePudding user response：

It may be easier if you use a dictionary to maintain the relation between each unique value (xxxx,yyyy...) and the new id you want (1, 2, 3...)

newIdDict={}
idCounter=1
for i in df['Car id'].unique():
   if i not in newIdDict:
     newIdDict[i] = 'car_' str(idCounter)
     idCounter  = 1

Then, you can use Pandas replace function to change the values in car_id column:

df['Car id'].replace(newIdDict, inplace=True)

Take into account that this will change ALL the xxxx, yyyy in your dataframe, so if you have any xxxx in lat or long columns it will also be modified

CodePudding user response：

The answers so far seem a little complicated to me, so here's another suggestion. This creates a dictionary that has the old name as the keys and the new name as the values. That can be used to map the old values to new values.

r={k:'car_{}'.format(i) for i,k in enumerate(df['Car id'].unique())}
df['car_id'] = df['car_id'].map(r)

edit: the answer using factorize is probably better even though I think this is a bit easier to read