Proper way to do this in pandas without using for loop-CodePudding

The question is I would like to avoid iterrows here.

From my dataframe I want to create a new column "unique" that will be based on the condition that if "a" and "b" column values are the same I would give it a value "uniqueN" then for all occurrence of the exact "a" and "b" I would need the same value "uniqueN".

In this case

"1", "3" (the first row) from "a" and "b" is the first unique pair, so I give that the value "unique1", and the seventh row will also have the same value which is "unique1" as it is also "1", "3".
"2", "2" (the second row) is the next unique "a", "b" pair so I give them "unique2" and the eight row also has "2", "2" so that will also have "unique2".
"3", "1" (third row) is the next unique, so "unique3", no more rows in the df is "3", "1" so that value wont repeat.
and so on

I have a working code that uses loops but this is not the pandas way, can anyone suggest how I can do this using pandas functions?

Expected Output (My code works, but its not using pandas methods)

   a  b   unique
0  1  3  unique1
1  2  2  unique2
2  3  1  unique3
3  4  2  unique4
4  3  3  unique5
5  4  2  unique4
6  1  3  unique1
7  2  2  unique2

Code

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3, 4, 3, 4, 1, 2], 'b': [3, 2, 1, 2, 3, 2, 3, 2]})

c = 1
seen = {}
for i, j in df.iterrows():
    j = tuple(j)
    if j not in seen:
        seen[j] = 'unique'   str(c)
        c  = 1

for key, value in seen.items():
    df.loc[(df.a == key[0]) & (df.b == key[1]), 'unique'] = value

CodePudding user response：

Let's use groupby ngroup with sort=False to ensure values are enumerated in order of appearance, add 1 so group numbers start at one, then convert to string with astype so we can add the prefix unique to the number:

df['unique'] = 'unique'   \
               df.groupby(['a', 'b'], sort=False).ngroup().add(1).astype(str)

Or with map and format instead of converting and concatenating:

df['unique'] = (
    df.groupby(['a', 'b'], sort=False).ngroup()
        .add(1)
        .map('unique{}'.format)
)

df:

   a  b   unique
0  1  3  unique1
1  2  2  unique2
2  3  1  unique3
3  4  2  unique4
4  3  3  unique5
5  4  2  unique4
6  1  3  unique1
7  2  2  unique2

Setup:

import pandas as pd

df = pd.DataFrame({
    'a': [1, 2, 3, 4, 3, 4, 1, 2], 'b': [3, 2, 1, 2, 3, 2, 3, 2]
})

CodePudding user response：

I came up with a slightly different solution. I'll add this for posterity, but the groupby answer is superior.

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3, 4, 3, 4, 1, 2], 'b': [3, 2, 1, 2, 3, 2, 3, 2]})

print(df)
df1 = df[~df.duplicated()]
print(df1)
df1['unique'] = df1.index
print(df1)
df2 = df.merge(df1, how='left')
print(df2)