Home > Mobile >  Assign unique ID to combination of two columns in pandas dataframe independently on their order
Assign unique ID to combination of two columns in pandas dataframe independently on their order

Time:10-14

I have a dataframe like this

col1 col2
1    2
2    1
2    3
3    2
3    4
4    3

and I would like to assign to each row a unique dataset based on col1 and col2 but independently on their order

col1 col2 id
1    2    1
2    1    1
2    3    2
3    2    2
3    4    3
4    3    3

How can I do this?

CodePudding user response:

One approach:

df["id"] = df.groupby(df[["col1", "col2"]].apply(frozenset, axis=1)).ngroup()   1
print(df)

Output

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3

Alternative using np.unique np.sort:

_, indices = np.unique(np.sort(df.values, axis=1), return_inverse=True, axis=0)
df["id"] = indices   1
print(df)

Output

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3

CodePudding user response:

You can apply it:

import pandas as pd

df = pd.DataFrame(data={"col1":[1,2,3,1,2,3], "col2":[3,2,1,3,2,1]})
df['id'] = df.apply(lambda row: min(row.col1, row.col2), axis=1)
print(df)

output:

   col1  col2  id
0     1     3   1
1     2     2   2
2     3     1   1
3     1     3   1
4     2     2   2
5     3     1   1

CodePudding user response:

Try np.sort:

a = np.sort(df, axis=1)
df['id'] = df.groupby([a[:,0],a[:,1]]).ngroup()   1

Output:

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3

CodePudding user response:

Can also use:

df['mask'] = df.apply(lambda x:','.join(map(str, x.sort_values())), axis=1)
df['id'] = (df['mask'] != df['mask'].shift()).cumsum()
df.drop(columns=['mask'], inplace=True)

Output:

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3
  • Related