I have a dataframe
that summarizes the segments of track within a bigger network. These segments have specific segement_id
s and it looks like this:
import pandas as pd
import numpy as np
my_dict = {
'segment_id':['a', 'b', 'c', 'd', 'e'],
'km_start':[2,4,9,15,20],
'km_end':[3,7,11,16,22],
'min_km_start':[0,0,0,0,0],
'max_km_end':[25,25,25,25,25]
}
df = pd.DataFrame(my_dict)
segment_id km_start km_end min_km_start max_km_end
0 a 2 3 0 25
1 b 4 7 0 25
2 c 9 11 0 25
3 d 15 16 0 25
4 e 20 22 0 25
Graphically, I want to do the following:
Inessence, I need to find the gaps between the two pairs of columns (['km_start','km_end']
and ['min_km_start','max_km_end']
) and fill in the segement_id
in a descending order starting from -1
.
Here is the desired output:
segment_id km_start km_end min_km_start max_km_end
0 -1 0 2 0 25
1 a 2 3 0 25
2 -2 3 4 0 25
3 b 4 7 0 25
4 -3 7 9 0 25
5 c 9 11 0 25
6 -4 11 15 0 25
7 d 15 16 0 25
8 -5 16 20 0 25
9 e 20 22 0 25
10 -6 22 25 0 25
CodePudding user response:
Try this
starts = pd.concat([pd.Series(df['min_km_start'].iloc[0]), df['km_end']]).reset_index(drop=True)
ends = pd.concat([df['km_start'], pd.Series(df['max_km_end'].iloc[0])]).reset_index(drop=True)
mask = ~starts.isin(df['km_start'])
pd.concat([df, pd.DataFrame({'km_start': starts[mask], 'km_end': ends[mask], 'segment_id': np.arange(-1,-mask.sum()-1,-1)})]).fillna(method='ffill').sort_values(by='km_start').reset_index(drop=True)
Output
segment_id km_start km_end min_km_start max_km_end
0 -1 0 2 0.0 25.0
1 a 2 3 0.0 25.0
2 -2 3 4 0.0 25.0
3 b 4 7 0.0 25.0
4 -3 7 9 0.0 25.0
5 c 9 11 0.0 25.0
6 -4 11 15 0.0 25.0
7 d 15 16 0.0 25.0
8 -5 16 20 0.0 25.0
9 e 20 22 0.0 25.0
10 -6 22 25 0.0 25.0