Pandas -- create ranks for diffrent records that similar except one column-CodePudding

I want to fetch 1 record from duplicate rows in Df except one column. example :

index	col1	col2	col3	col4
0	abc	efg	hij	op
1	abc	efg	hij	ki
2	abc	efg	hij	tf
3	abc	efg	hij	ge
4	xyz	mmm	qt	aa
5	xyz	mmm	qt	bb
6	xyz	mmm	qt	cc

(order by col4 asc) Thus, desired result could be like

index	col1	col2	col3	col4	rank
0	abc	efg	hij	op	1
1	abc	efg	hij	ki	2
2	abc	efg	hij	tf	3
3	abc	efg	hij	ge	4
4	xyz	mmm	qt	aa	1
5	xyz	mmm	qt	bb	2
6	xyz	mmm	qt	cc	3

The goal is to obtain a rank for each similar result to fetch data as df = df[df['rank'] == 1]

CodePudding user response：

Use:

df['rank'] = df.groupby(df.columns.difference(['col4']).tolist()).cumcount().add(1)

CodePudding user response：

ansev's answers does the job but for those of you who like the rank method two things might be worth trying. Unfortunately, pandas.groupby.rank does only work for one column but this case that might be enough. I confess, the case for multiple columns is a little bit busy...

df = pd.DataFrame({
'col1':['abc','abc','abc','abc','xyz','xyz','xyz'],
'col2':['efg','efg','efg','efg','mmm','mmm','mmm'],
'col3':['hij','hij','hij','hij','qt','qt','qt'],
'col4':['op','ki','tf','ge','aa','bb','cc']
})

df['rank_single_column'] = df.groupby('col3')['col4'].rank(method='first').astype(int)

df['rank_multiple_columns'] = df.groupby(df[df.columns.difference(['col4']).tolist()].apply(tuple,axis=1))['col4'].rank(method='first').astype(int)

An advandage of rank (not asked in the question) would be the possibility to sort for the 'col4'. This could also be achieved by adding sort_values() into ansev's answer.

df['rank'] = df.sort_values(['col4'],ascending=True).groupby(df.columns.difference(['col4']).tolist()).cumcount().add(1)

index	col1	col2	col3	col4
0	abc	efg	hij	op
1	abc	efg	hij	ki
2	abc	efg	hij	tf
3	abc	efg	hij	ge
4	xyz	mmm	qt	aa
5	xyz	mmm	qt	bb
6	xyz	mmm	qt	cc

index	col1	col2	col3	col4	rank
0	abc	efg	hij	op	1
1	abc	efg	hij	ki	2
2	abc	efg	hij	tf	3
3	abc	efg	hij	ge	4
4	xyz	mmm	qt	aa	1
5	xyz	mmm	qt	bb	2
6	xyz	mmm	qt	cc	3

index	col1	col2	col3	col4
0	abc	efg	hij	op
1	abc	efg	hij	ki
2	abc	efg	hij	tf
3	abc	efg	hij	ge
4	xyz	mmm	qt	aa
5	xyz	mmm	qt	bb
6	xyz	mmm	qt	cc

index	col1	col2	col3	col4	rank
0	abc	efg	hij	op	1
1	abc	efg	hij	ki	2
2	abc	efg	hij	tf	3
3	abc	efg	hij	ge	4
4	xyz	mmm	qt	aa	1
5	xyz	mmm	qt	bb	2
6	xyz	mmm	qt	cc	3

index	col1	col2	col3	col4
0	abc	efg	hij	op
1	abc	efg	hij	ki
2	abc	efg	hij	tf
3	abc	efg	hij	ge
4	xyz	mmm	qt	aa
5	xyz	mmm	qt	bb
6	xyz	mmm	qt	cc

index	col1	col2	col3	col4	rank
0	abc	efg	hij	op	1
1	abc	efg	hij	ki	2
2	abc	efg	hij	tf	3
3	abc	efg	hij	ge	4
4	xyz	mmm	qt	aa	1
5	xyz	mmm	qt	bb	2
6	xyz	mmm	qt	cc	3