How to get minimum date or earliest date of a column by applying filter on another column-CodePudding

I want to find the earliest date of a Vin column. By applying a filter 1 on the columns Value_1 and Value_2.The date is given in another column 'Date'

Below is my data frame.

    import pandas as pd


df_merge= pd.DataFrame({'Vin': ['a123', 'a123', 'a123', 'a123', 'b123', 'b123', 'b123', 'b123'],
                   'Date': ["2022-03-21T15:20:07.536Z", '2022-03-21T15:20:07.510Z', '2022-03-21T15:20:07.535Z',
                            '2022-03-21T15:20:07.535Z','2022-03-22T09:14:59.615Z','2022-03-22T09:14:59.412Z',
                            '2022-03-22T09:14:59.512Z','2022-03-22T09:14:59.615Z'],
                        'Value_1':['1', '0', '1', '1','1', '0', '0', '1'],
                       'Value_2':['1', '1', '1', '0','1', '0', '1', '1']})

I have tried one method in which I have created another data frame by applying the required filter and then I used the below command to get the minimum date.

Temp_table = pd.DataFrame()
Temp_table = df_merge[(df_merge['Value_1']  == 1) & (df_merge['Value_2']  == 1)]


Temp_table['Result'] = np.where(Temp_table.groupby('Vin')['Date'].transform('min').eq(Temp_table['Date']), 'Yes','No')

After this, I merged this column with my original data frame. This creates a very big data frame which I don't want. So my question is, Is there any way to get my requirement in the same data frame, Without creating any other df.

Below is my expected data frame with the 'Result' column:-

df_merge= pd.DataFrame({'Vin': ['a123', 'a123', 'a123', 'a123', 'b123', 'b123', 'b123', 'b123'],
                   'Date': ["2022-03-21T15:20:07.536Z", '2022-03-21T15:20:07.510Z', '2022-03-21T15:20:07.535Z',
                            '2022-03-21T15:20:07.535Z','2022-03-22T09:14:59.615Z','2022-03-22T09:14:59.412Z',
                            '2022-03-22T09:14:59.512Z','2022-03-22T09:14:59.615Z'],
                        'Value_1':['1', '0', '1', '1','1', '0', '0', '1'],
                       'Value_2':['1', '1', '1', '0','1', '0', '1', '1'],
                       'Result':['No', 'No', 'Yes', 'No','Yes', 'No', 'No', 'Yes']})

df_merge

CodePudding user response：

You can use:

Update

idx = (df_merge.assign(Date=pd.to_datetime(df_merge['Date']))
               .loc[df_merge['Value_1'].eq('1') & df_merge['Value_2'].eq('1')]
               .groupby('Vin')['Date'].rank(method='min')
               .loc[lambda x: x == 1].index)

df_merge['Result'] = np.where(df_merge.index.isin(idx), 'Yes', 'No')

Old answer

idx = (df_merge.assign(Date=pd.to_datetime(df_merge['Date']))
               .loc[df_merge['Value_1'].eq(1) & df_merge['Value_2'].eq(1)]
               .groupby('Vin')['Date'].idxmin())

df_merge['Result'] = np.where(df_merge.index.isin(idx), 'Yes', 'No')

Output:

>>> idx
Vin
a123    2
b123    7
Name: Date, dtype: int64

>>> df_merge
    Vin                      Date  Value_1  Value_2 Result
0  a123  2022-03-21T15:20:07.536Z        1        1     No
1  a123  2022-03-21T15:20:07.510Z        0        1     No
2  a123  2022-03-21T15:20:07.535Z        1        1    Yes
3  a123  2022-03-21T15:20:07.535Z        1        0     No
4  b123  2022-03-22T09:14:59.616Z        1        1     No
5  b123  2022-03-22T09:14:59.412Z        0        0     No
6  b123  2022-03-22T09:14:59.512Z        0        1     No
7  b123  2022-03-22T09:14:59.615Z        1        1    Yes

Note: If Date is already DatetimeIndex, you can safely remove the assign method.