Home > Net >  Need to know if the same ID it's repeated but with a different DATE
Need to know if the same ID it's repeated but with a different DATE

Time:11-21

I'm trying to see if the same "ID" its repeated but with a different "DATE" value.

I was thinking using a numpy.where, so I created the column "Count" to use something like this:

df['FULFILL?'] = np.where((df['Count']>1) & (df['DATE']), 'YES', 'NO')

But then I got stuck because I was not sure how to end the second condition. Here's an example:

ID Count DATE
111 3 01/01/2020
222 2 02/12/2020
111 3 01/01/2020
222 2 02/12/2020
111 3 02/10/2020
333 2 01/25/2020
333 2 05/02/2020
444 1 01/01/2020

I'm looking an output like this:

ID Count DATE FULFILL?
111 3 01/01/2020 YES
222 2 02/12/2020 NO
111 3 01/01/2020 YES
222 2 02/12/2020 NO
111 3 02/10/2020 YES
333 2 01/25/2020 YES
333 2 05/02/2020 YES
444 1 01/01/2020 NO

Sorry if my english it's not very good :)

CodePudding user response:

Use GroupBy.transform with DataFrameGroupBy.nunique for test number of unique values per groups, first condition (df['Count']>1) is removed, because for single value per groups number of unique values is not greater like 1:

df['FULFILL?'] = np.where(df.groupby('ID')['DATE'].transform('nunique').gt(1), 'YES', 'NO')
print (df)
    ID  Count        DATE FULFILL?
0  111      3  01/01/2020      YES
1  222      2  02/12/2020       NO
2  111      3  01/01/2020      YES
3  222      2  02/12/2020       NO
4  111      3  02/10/2020      YES
5  333      2  01/25/2020      YES
6  333      2  05/02/2020      YES
7  444      1  01/01/2020       NO

Details:

print (df.groupby('ID')['DATE'].transform('nunique'))
0    2
1    1
2    2
3    1
4    2
5    2
6    2
7    1
Name: DATE, dtype: int64

CodePudding user response:

You can use nunique:

# get number of unique dates per id
s = df.groupby('ID')['DATE'].nunique()
# identify non unique ones
df['FULFILL?'] = np.where(df['ID'].isin(s[s>1].index), 'YES', 'NO')

Or with both conditions:

df['FULFILL?'] = np.where(df['Count'].gt(1) & df['ID'].isin(s[s>1].index),
                          'YES', 'NO')

NB. using an intermediate s and isin, as it is more efficient than groupby.transform('nunique').

Output:

    ID  Count        DATE FULFILL?
0  111      3  01/01/2020      YES
1  222      2  02/12/2020       NO
2  111      3  01/01/2020      YES
3  222      2  02/12/2020       NO
4  111      3  02/10/2020      YES
5  333      2  01/25/2020      YES
6  333      2  05/02/2020      YES
7  444      1  01/01/2020       NO

Count per group:

ID
111    2
222    1
333    2
444    1
Name: DATE, dtype: int64
  • Related