I'm a little new to Python. I am working on movement data. I have ID, location, speed, time, and destination columns. My dataframe
looks like this:
shipid | destination | lon | lat | speed | time |
---|---|---|---|---|---|
508040000 | Brunei | 139.356 | 34.172 | 13 | 2016/01/12 16:44:03 |
508040000 | Brunei | 139.356 | 34.173 | 12.9 | 2016/01/12 16:44:03 |
508040000 | Brunei | 139.781 | 35.245 | 12.7 | 2016/01/12 16:44:03 |
508040000 | Busan | 139.358 | 34.172 | 15.8 | 2016/03/01 16:09:36 |
508040000 | Busan | 139.359 | 34.173 | 15.6 | 2016/03/01 16:10:06 |
508040000 | Busan | 139.362 | 34.177 | 15.9 | 2016/03/01 16:11:07 |
431000792 | Tokyo | 139.673 | 34.842 | 11 | 2016/04/01 22:17:51 |
431000792 | Tokyo | 139.683 | 34.914 | 11 | 2016/04/01 22:41:11 |
431000792 | Tokyo | 139.684 | 34.926 | 10.3 | 2016/04/01 22:45:20 |
431000792 | Aomi | 140.760717 | 40.8771 | 0.1 | 2016/05/03 22:03:02 |
431000792 | Aomi | 140.759967 | 40.877033 | 0 | 2016/05/03 23:54:19 |
431000792 | Aomi | 140.760117 | 40.876917 | 0 | 2016/05/04 0:06:00 |
I want to loop or iterate through ID and destination to create a new trip ID column.
shipid | destination | lat | lon | speed | time | tripid |
---|---|---|---|---|---|---|
508040000 | Brunei | 139.356 | 34.172 | 13 | 2016/01/12 16:44:03 | 1 |
508040000 | Brunei | 139.356 | 34.173 | 12.9 | 2016/01/12 16:44:03 | 1 |
508040000 | Brunei | 139.781 | 35.245 | 12.7 | 2016/01/12 16:44:03 | 1 |
508040000 | Busan | 139.358 | 34.172 | 15.8 | 2016/03/01 16:09:36 | 2 |
508040000 | Busan | 139.359 | 34.173 | 15.6 | 2016/03/01 16:10:06 | 2 |
508040000 | Busan | 139.362 | 34.177 | 15.9 | 2016/03/01 16:11:07 | 2 |
431000792 | Tokyo | 139.673 | 34.842 | 11 | 2016/04/01 22:17:51 | 1 |
431000792 | Tokyo | 139.683 | 34.914 | 11 | 2016/04/01 22:41:11 | 1 |
431000792 | Tokyo | 139.684 | 34.926 | 10.3 | 2016/04/01 22:45:20 | 1 |
431000792 | Aomi | 140.760717 | 40.8771 | 0.1 | 2016/05/03 22:03:02 | 2 |
431000792 | Aomi | 140.759967 | 40.877033 | 0 | 2016/05/03 23:54:19 | 2 |
431000792 | Aomi | 140.760117 | 40.876917 | 0 | 2016/05/04 0:06:00 | 2 |
So I wrote as follows
for shipid in df.shipid.unique():
tripid=1
for destination in df.destination.unique():
df['tripid'] = np.where(df['destination']==df['destination'].shift(-1),tripid, tripid 1)
The code creates a new column tripid
where every row is assigned "1", except for the last row of each destination, where "2" is assigned.
I want to assign a unique ID to each ship with the same destination as 1, 2, 3.
I am not sure of what to use, either loop or iteration, etc. I tried to use it for a loop and the error shows the series is ambiguous.
CodePudding user response:
Does this solve your problem?
import pandas as pd
# Example dataframe:
df = pd.DataFrame(
{'shipid': [1, 1, 1, 1, 1, 2, 2],
'destination': ['Brunei', 'Brunei', 'Brunei', 'Busan', 'Busan', 'Tokyo', 'Tokyo']}
)
df['tripid'] = df.groupby(['destination']).cumcount() 1
shipid | destination | tripid |
---|---|---|
1 | Brunei | 1 |
1 | Brunei | 2 |
1 | Brunei | 3 |
1 | Busan | 1 |
1 | Busan | 2 |
2 | Tokyo | 1 |
2 | Tokyo | 2 |
You can also group by both shipid
and destination
:
df['tripid'] = df.groupby(['shipid', 'destination']).cumcount() 1
Or maybe you need this?
https://stackoverflow.com/a/51329888/14627505
df['tripid'] = df.groupby(['shipid', 'destination']).ngroup() 1
shipid | destination | tripid |
---|---|---|
1 | Brunei | 1 |
1 | Brunei | 1 |
1 | Brunei | 1 |
1 | Busan | 2 |
1 | Busan | 2 |
2 | Tokyo | 3 |
2 | Tokyo | 3 |