my code and input:
import pandas as pd
df1 = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})
df2 = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
'label':['other','heaptic','other','other','splenic','other','other','other','splenic','other','hepatic','other']})
label = ['other','hepatic','splenic']
for i in range(0,len(label)):
if label[i] not in 'other':
start_frame = df1.loc[(df1['label']=='splenic'),'frame']
end_frame = df1.loc[(df1['label']=='hepatic'),'frame']
else: print('other')
Let's say I want mask start/end frame between two occur labels (splenic
and hepatic
) for further calculation. My problem is that in dataframes labels occur in a different order. For example in df1
, first come up for my interest is splenic
in frame
=2 so it will be my start_frame
, as next my end_frame
is when I occur hepatic
in frame
=5. if we go further as next my start_frame
will be hepatic
in frame
=9 and end_frame
is splenic
in frame
=11. in df2 the order reverse. No really pattern who come up first.
So I don't really have to say that is splenic
will be my "start" and hepatic
will be "end". It depends on who I comes up first: hepatic
or splenic
so it will be "start", and second label will be "end" respectively.
What I expect for df1
:
start_frame=[2,9]
end_frame=[5,11]
CodePudding user response:
I propose to get every occurrence of one of your target labels:
import numpy as np
import pandas as pd
df = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})
is_key = (df.label == "splenic") | (df.label == "hepatic")
Now we can extract the indices of the occurrences and set every (index of index in id_key
) even to start
and every odd to end
:
id_key = np.where(is_key)[0]
start_frame_id, end_frame_id = id_key.reshape(-1, 2).T
The proper start and end frame are:
start_frame = df.loc[start_frame_id, "frame"]
end_frame = df.loc[end_frame_id, "frame"]
which results in:
>>> start_frame
1 2
8 9
Name: frame, dtype: int64
>>> end_frame
4 5
10 11
Name: frame, dtype: int64
CodePudding user response:
You may try like this
import pandas as pd
df1 = pd.DataFrame({'frame': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'label': ['other', 'splenic', 'other', 'other', 'hepatic', 'other', 'other', 'other', 'hepatic',
'other', 'splenic', 'other']})
df2 = pd.DataFrame({'frame': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'label': ['other', 'heaptic', 'other', 'other', 'splenic', 'other', 'other', 'other', 'splenic',
'other', 'hepatic', 'other']})
def get_label(dataframe):
label = ['hepatic', 'splenic']
start_val = end_val = ''
for _, df in dataframe.iterrows():
if df['label'] in label:
start_val = df['label']
end_val = label[label.index(start_val)-1]
break
start_frame = list(dataframe.loc[(dataframe['label'] == start_val), 'frame'].values)
end_frame = list(dataframe.loc[(dataframe['label'] == end_val), 'frame'].values)
return start_frame, end_frame
if __name__ == '__main__':
start, end = get_label(df1)
print(start, end)
CodePudding user response:
Try this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})
# keep only rows with splenic or hepatic
df = df[(df.label == "splenic") | (df.label == "hepatic")]
# assign start/end, assumes there will be an even number of splenic/hepatic
df['tag'] = np.tile(['start','end' ], len(df)//2)
# from here you can extract the values you want
print(df)
# output
frame label tag
1 2 splenic start
4 5 hepatic end
8 9 hepatic start
10 11 splenic end