I have a table:
Key | Sequence | Longitude | Latitude |
---|---|---|---|
1001 | 1 | 18.2 | 14.2 |
1001 | 2 | 18.2 | 14.2 |
1001 | 3 | 18.2 | 14.2 |
2001 | 1 | 25.6 | 22.8 |
2001 | 2 | 25.6 | 22.8 |
2001 | 3 | 25.6 | 22.8 |
5004 | 1 | 25.6 | 22.8 |
5004 | 2 | 25.6 | 22.8 |
5004 | 3 | 25.6 | 22.8 |
6895 | 1 | 36.2 | 17.4 |
6895 | 2 | 36.2 | 17.4 |
6895 | 3 | 36.2 | 17.4 |
6650 | 1 | 18.2 | 14.2 |
6650 | 2 | 18.2 | 14.2 |
6650 | 3 | 18.2 | 14.2 |
From the table I need to find out the keys (different keys) that have duplicate longitude and latitude. If a key having any of sequence has duplicate longitude and latitude all the sequences of the same key should be shown duplicate. (comparison between same key sequences does not happen).
The output table should be:
Key | Sequence | Longitude | Latitude | Duplicate |
---|---|---|---|---|
1001 | 1 | 18.2 | 14.2 | No |
1001 | 2 | 18.2 | 14.2 | No |
1001 | 3 | 18.2 | 14.2 | No |
2001 | 1 | 25.6 | 22.8 | No |
2001 | 2 | 25.6 | 22.8 | No |
2001 | 3 | 25.6 | 22.8 | No |
5004 | 1 | 25.6 | 22.8 | Yes |
5004 | 2 | 25.6 | 22.8 | Yes |
5004 | 3 | 25.6 | 22.8 | Yes |
6895 | 1 | 36.2 | 17.4 | No |
6895 | 2 | 36.2 | 17.4 | No |
6895 | 3 | 36.2 | 17.4 | No |
6650 | 1 | 18.2 | 14.2 | Yes |
6650 | 2 | 18.2 | 14.2 | Yes |
6650 | 3 | 18.2 | 14.2 | Yes |
CodePudding user response:
First remove the duplicates per key, then check what is still duplicated between keys:
# get example of duplicated rows
s = (df
.drop_duplicates(subset=['Key', 'Longitude', 'Latitude'])
.duplicated(['Longitude', 'Latitude'])
)
# extract the original keys and mark all matching rows
df['Duplicate'] = np.where(df['Key'].isin(df.loc[s[s].index, 'Key']), 'Yes', 'No')
print(df)
Output:
Key Sequence Longitude Latitude Duplicate
0 1001 1 18.2 14.2 No
1 1001 2 18.2 14.2 No
2 1001 3 18.2 14.2 No
3 2001 1 25.6 22.8 No
4 2001 2 25.6 22.8 No
5 2001 3 25.6 22.8 No
6 5004 1 25.6 22.8 Yes
7 5004 2 25.6 22.8 Yes
8 5004 3 25.6 22.8 Yes
9 6895 1 36.2 17.4 No
10 6895 2 36.2 17.4 No
11 6895 3 36.2 17.4 No
12 6650 1 18.2 14.2 Yes
13 6650 2 18.2 14.2 Yes
14 6650 3 18.2 14.2 Yes
CodePudding user response:
#first get the keys which are duplicate
key = list(df.duplicated(['Longitude', 'Latitude'])['Key'].unique())
#then put np.where to assign yes or no
df['Duplicate'] = np.where(df['Key'].isin(key), 'Yes', 'No')