I have a dataframe that has similar ids with spatiotemporal data like below:
car_id lat long
xxx 32 150
xxx 33 160
yyy 20 140
yyy 22 140
zzz 33 70
zzz 33 80
. . .
I want to replace car_id
with car_1, car_2, car_3, ... However, my dataframe is large and it's not possible to do it manually by name so first I made a list of all unique values in the car_id
column and made a list of names that should be replaced with:
u_values = [i for i in df['car_id'].unique()]
r = ['car' str(i) for i in range(len(u_values))]
Now I'm not sure how to replace all unique numbers in car_id
column with list values so the result is like this:
car_id lat long
car_1 32 150
car_1 33 160
car_2 20 140
car_2 22 140
car_3 33 70
car_3 33 80
. . .
CodePudding user response:
Create a mapping from u_values
to r
and map it to car_id
column.
mapping = pd.Series(r, index=u_values)
df['car_id'] = df['car_id'].map(mapping)
That said, it seems vectorized string concatenation is enough for this task. factorize()
method encodes the strings.
df['car_id'] = 'car_' pd.Series(df['car_id'].factorize()[0], dtype='string')
CodePudding user response:
It may be easier if you use a dictionary to maintain the relation between each unique value (xxxx,yyyy...) and the new id you want (1, 2, 3...)
newIdDict={}
idCounter=1
for i in df['Car id'].unique():
if i not in newIdDict:
newIdDict[i] = 'car_' str(idCounter)
idCounter = 1
Then, you can use Pandas replace function to change the values in car_id column:
df['Car id'].replace(newIdDict, inplace=True)
Take into account that this will change ALL the xxxx, yyyy in your dataframe, so if you have any xxxx in lat or long columns it will also be modified
CodePudding user response:
The answers so far seem a little complicated to me, so here's another suggestion. This creates a dictionary that has the old name as the keys and the new name as the values. That can be used to map the old values to new values.
r={k:'car_{}'.format(i) for i,k in enumerate(df['Car id'].unique())}
df['car_id'] = df['car_id'].map(r)
edit: the answer using factorize is probably better even though I think this is a bit easier to read