I'm trying to see if the same "ID" its repeated but with a different "DATE" value.
I was thinking using a numpy.where, so I created the column "Count" to use something like this:
df['FULFILL?'] = np.where((df['Count']>1) & (df['DATE']), 'YES', 'NO')
But then I got stuck because I was not sure how to end the second condition. Here's an example:
ID | Count | DATE |
---|---|---|
111 | 3 | 01/01/2020 |
222 | 2 | 02/12/2020 |
111 | 3 | 01/01/2020 |
222 | 2 | 02/12/2020 |
111 | 3 | 02/10/2020 |
333 | 2 | 01/25/2020 |
333 | 2 | 05/02/2020 |
444 | 1 | 01/01/2020 |
I'm looking an output like this:
ID | Count | DATE | FULFILL? |
---|---|---|---|
111 | 3 | 01/01/2020 | YES |
222 | 2 | 02/12/2020 | NO |
111 | 3 | 01/01/2020 | YES |
222 | 2 | 02/12/2020 | NO |
111 | 3 | 02/10/2020 | YES |
333 | 2 | 01/25/2020 | YES |
333 | 2 | 05/02/2020 | YES |
444 | 1 | 01/01/2020 | NO |
Sorry if my english it's not very good :)
CodePudding user response:
Use GroupBy.transform
with DataFrameGroupBy.nunique
for test number of unique values per groups, first condition (df['Count']>1)
is removed, because for single value per groups number of unique values is not greater like 1
:
df['FULFILL?'] = np.where(df.groupby('ID')['DATE'].transform('nunique').gt(1), 'YES', 'NO')
print (df)
ID Count DATE FULFILL?
0 111 3 01/01/2020 YES
1 222 2 02/12/2020 NO
2 111 3 01/01/2020 YES
3 222 2 02/12/2020 NO
4 111 3 02/10/2020 YES
5 333 2 01/25/2020 YES
6 333 2 05/02/2020 YES
7 444 1 01/01/2020 NO
Details:
print (df.groupby('ID')['DATE'].transform('nunique'))
0 2
1 1
2 2
3 1
4 2
5 2
6 2
7 1
Name: DATE, dtype: int64
CodePudding user response:
You can use nunique
:
# get number of unique dates per id
s = df.groupby('ID')['DATE'].nunique()
# identify non unique ones
df['FULFILL?'] = np.where(df['ID'].isin(s[s>1].index), 'YES', 'NO')
Or with both conditions:
df['FULFILL?'] = np.where(df['Count'].gt(1) & df['ID'].isin(s[s>1].index),
'YES', 'NO')
NB. using an intermediate s
and isin
, as it is more efficient than groupby.transform('nunique')
.
Output:
ID Count DATE FULFILL?
0 111 3 01/01/2020 YES
1 222 2 02/12/2020 NO
2 111 3 01/01/2020 YES
3 222 2 02/12/2020 NO
4 111 3 02/10/2020 YES
5 333 2 01/25/2020 YES
6 333 2 05/02/2020 YES
7 444 1 01/01/2020 NO
Count per group:
ID
111 2
222 1
333 2
444 1
Name: DATE, dtype: int64