Home > Software design >  Combine unique IDs and create an extra ID column
Combine unique IDs and create an extra ID column

Time:09-09

I have a dataframe with employee names, each row representing their contract information, the type of employee of their contract, the contract ID and which contract is the primary one (if the person has 1 contract, it is always marked as "Yes", if they have more than 1 contract, 1 is marked as "Yes" and others as "No") like in this example:

Name    Employee Type   ID  Primary Contract?
Paulo   Employee        10  Yes
Paulo   Contractor      10  No
Paulo   Contractor      10  No
Paulo   Employee        10  No
Paulo   Employee        10  No
Pedro   Employee        11  No
Pedro   Contractor      11  Yes
Jonas   Contractor      16  Yes
Jonas   Contractor      16  No
Felipe  Non Worker      15  Yes
Maria   Employee        13  No
Maria   Contractor      13  Yes
Maria   Employee        13  No

I need to create a column named "Assigned ID" that follows this logic:

  • The assignment ID for the row marked with "Yes" in "Primary Contract?" will always be the first digit letter of the employee type the ID number

  • The assignment ID for the other rows would follow the same logic but, if that ID already exists, add a "-2" right after it (not starting from the "-1" but "-2") and subsequential increase (-3,-4,etc) for the duplicates (no specific order for the ones with "No" in "Primary Contract?").

  • If the first digit letter of the ID changes, it is not considered the same ID.

The output expected then would be:

Name    Employee Type   ID  Primary Contract?   Assignment ID
Paulo   Employee        10  Yes                 E10
Paulo   Contractor      10  No                  C10
Paulo   Contractor      10  No                  C10-2
Paulo   Employee        10  No                  E10-2
Paulo   Employee        10  No                  E10-3
Pedro   Employee        11  No                  E11
Pedro   Contractor      11  Yes                 C11
Jonas   Contractor      16  Yes                 C16
Jonas   Contractor      16  No                  C16-2
Felipe  Non Worker      15  Yes                 N15
Maria   Employee        13  No                  E13
Maria   Contractor      13  Yes                 C13
Maria   Employee        13  No                  E13-2

Thank you so much for the support!

CodePudding user response:

Let's try with groupby().cumcount() to enumerate the rows within each ID/Type

df['Assignment_ID'] = df['Employee Type'].str[0].add(df['ID'].astype(str))

# also can groupby(['Employee Type', 'ID'])
counts df.groupby('Assignment_ID').cumcount().add(1)
df['Assignment_ID']  = np.where(counts==1, '', '-' counts.astype(str))
print(df)

Output:

      Name Employee Type  ID Primary Contract? Assignment_ID
0    Paulo      Employee  10               Yes           E10
1    Paulo    Contractor  10                No           C10
2    Paulo    Contractor  10                No         C10-2
3    Paulo      Employee  10                No         E10-2
4    Paulo      Employee  10                No         E10-3
5    Pedro      Employee  11                No           E11
6    Pedro    Contractor  11               Yes           C11
7    Jonas    Contractor  16               Yes           C16
8    Jonas    Contractor  16                No         C16-2
9   Felipe    Non Worker  15               Yes           N15
10   Maria      Employee  13                No           E13
11   Maria    Contractor  13               Yes           C13
12   Maria      Employee  13                No         E13-2
  • Related