To create a new column to assign a new ID using loop or numpy, where-CodePudding

I'm a little new to Python. I am working on movement data. I have ID, location, speed, time, and destination columns. My dataframe looks like this:

shipid	destination	lon	lat	speed	time
508040000	Brunei	139.356	34.172	13	2016/01/12 16:44:03
508040000	Brunei	139.356	34.173	12.9	2016/01/12 16:44:03
508040000	Brunei	139.781	35.245	12.7	2016/01/12 16:44:03
508040000	Busan	139.358	34.172	15.8	2016/03/01 16:09:36
508040000	Busan	139.359	34.173	15.6	2016/03/01 16:10:06
508040000	Busan	139.362	34.177	15.9	2016/03/01 16:11:07
431000792	Tokyo	139.673	34.842	11	2016/04/01 22:17:51
431000792	Tokyo	139.683	34.914	11	2016/04/01 22:41:11
431000792	Tokyo	139.684	34.926	10.3	2016/04/01 22:45:20
431000792	Aomi	140.760717	40.8771	0.1	2016/05/03 22:03:02
431000792	Aomi	140.759967	40.877033	0	2016/05/03 23:54:19
431000792	Aomi	140.760117	40.876917	0	2016/05/04 0:06:00

I want to loop or iterate through ID and destination to create a new trip ID column.

shipid	destination	lat	lon	speed	time	tripid
508040000	Brunei	139.356	34.172	13	2016/01/12 16:44:03	1
508040000	Brunei	139.356	34.173	12.9	2016/01/12 16:44:03	1
508040000	Brunei	139.781	35.245	12.7	2016/01/12 16:44:03	1
508040000	Busan	139.358	34.172	15.8	2016/03/01 16:09:36	2
508040000	Busan	139.359	34.173	15.6	2016/03/01 16:10:06	2
508040000	Busan	139.362	34.177	15.9	2016/03/01 16:11:07	2
431000792	Tokyo	139.673	34.842	11	2016/04/01 22:17:51	1
431000792	Tokyo	139.683	34.914	11	2016/04/01 22:41:11	1
431000792	Tokyo	139.684	34.926	10.3	2016/04/01 22:45:20	1
431000792	Aomi	140.760717	40.8771	0.1	2016/05/03 22:03:02	2
431000792	Aomi	140.759967	40.877033	0	2016/05/03 23:54:19	2
431000792	Aomi	140.760117	40.876917	0	2016/05/04 0:06:00	2

So I wrote as follows

for shipid in df.shipid.unique():
    tripid=1
    for destination in df.destination.unique():
        df['tripid'] = np.where(df['destination']==df['destination'].shift(-1),tripid, tripid 1)

The code creates a new column tripid where every row is assigned "1", except for the last row of each destination, where "2" is assigned.

I want to assign a unique ID to each ship with the same destination as 1, 2, 3.

I am not sure of what to use, either loop or iteration, etc. I tried to use it for a loop and the error shows the series is ambiguous.

CodePudding user response：

Does this solve your problem?

import pandas as pd

# Example dataframe:
df = pd.DataFrame(
    {'shipid': [1, 1, 1, 1, 1, 2, 2],
    'destination': ['Brunei', 'Brunei', 'Brunei', 'Busan', 'Busan', 'Tokyo', 'Tokyo']}
)

df['tripid'] = df.groupby(['destination']).cumcount()   1

shipid	destination	tripid
1	Brunei	1
1	Brunei	2
1	Brunei	3
1	Busan	1
1	Busan	2
2	Tokyo	1
2	Tokyo	2

You can also group by both shipid and destination:

df['tripid'] = df.groupby(['shipid', 'destination']).cumcount()   1

Or maybe you need this?

https://stackoverflow.com/a/51329888/14627505

df['tripid'] = df.groupby(['shipid', 'destination']).ngroup()   1

shipid	destination	tripid
1	Brunei	1
1	Brunei	1
1	Brunei	1
1	Busan	2
1	Busan	2
2	Tokyo	3
2	Tokyo	3