Home > Back-end >  Which value comes up first in df?
Which value comes up first in df?

Time:10-06

my code and input:

import pandas as pd

df1 = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})
df2 = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','heaptic','other','other','splenic','other','other','other','splenic','other','hepatic','other']})
label = ['other','hepatic','splenic']
for i in range(0,len(label)):
    if label[i] not in 'other':
        start_frame = df1.loc[(df1['label']=='splenic'),'frame']
        end_frame = df1.loc[(df1['label']=='hepatic'),'frame']
    else: print('other')

Let's say I want mask start/end frame between two occur labels (splenic and hepatic) for further calculation. My problem is that in dataframes labels occur in a different order. For example in df1, first come up for my interest is splenic in frame=2 so it will be my start_frame, as next my end_frame is when I occur hepatic in frame=5. if we go further as next my start_frame will be hepatic in frame=9 and end_frame is splenic in frame=11. in df2 the order reverse. No really pattern who come up first.
So I don't really have to say that is splenic will be my "start" and hepatic will be "end". It depends on who I comes up first: hepatic or splenic so it will be "start", and second label will be "end" respectively. What I expect for df1:

start_frame=[2,9]
end_frame=[5,11]

CodePudding user response:

I propose to get every occurrence of one of your target labels:

import numpy as np
import pandas as pd

df = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})

is_key = (df.label == "splenic") | (df.label == "hepatic")

Now we can extract the indices of the occurrences and set every (index of index in id_key) even to start and every odd to end:

id_key = np.where(is_key)[0]
start_frame_id, end_frame_id = id_key.reshape(-1, 2).T

The proper start and end frame are:

start_frame = df.loc[start_frame_id, "frame"]
end_frame = df.loc[end_frame_id, "frame"]

which results in:

>>> start_frame
1    2
8    9
Name: frame, dtype: int64

>>> end_frame
4      5
10    11
Name: frame, dtype: int64

CodePudding user response:

You may try like this

import pandas as pd

df1 = pd.DataFrame({'frame': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                    'label': ['other', 'splenic', 'other', 'other', 'hepatic', 'other', 'other', 'other', 'hepatic',
                              'other', 'splenic', 'other']})
df2 = pd.DataFrame({'frame': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                    'label': ['other', 'heaptic', 'other', 'other', 'splenic', 'other', 'other', 'other', 'splenic',
                              'other', 'hepatic', 'other']})


def get_label(dataframe):
    label = ['hepatic', 'splenic']
    start_val = end_val = ''

    for _, df in dataframe.iterrows():
        if df['label'] in label:
            start_val = df['label']
            end_val = label[label.index(start_val)-1]
            break

    start_frame = list(dataframe.loc[(dataframe['label'] == start_val), 'frame'].values)
    end_frame = list(dataframe.loc[(dataframe['label'] == end_val), 'frame'].values)
    
    return start_frame, end_frame


if __name__ == '__main__':
    start, end = get_label(df1)
    print(start, end)

CodePudding user response:

Try this:

import numpy as np
import pandas as pd

df = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})

# keep only rows with splenic or hepatic
df = df[(df.label == "splenic") | (df.label == "hepatic")]

# assign start/end, assumes there will be an even number of splenic/hepatic
df['tag'] = np.tile(['start','end' ], len(df)//2)

# from here you can extract the values you want
print(df)

# output

    frame    label    tag
1       2  splenic  start
4       5  hepatic    end
8       9  hepatic  start
10     11  splenic    end
  • Related