Example dataframe:
df_sample = pd.DataFrame({'query': {0: 'keyword_1',
1: 'keyword_1',
2: 'keyword_2',
3: 'keyword_2',
4: 'keyword_3',
5: 'keyword_3',
6: 'keyword_4',
7: 'keyword_4'},
'page': {0: 'google.com',
1: 'apple.com',
2: 'google.com',
3: 'apple.com',
4: 'google.com',
5: 'apple.com',
6: 'papaya.com',
7: 'foobaar.com'},
'rank': {0: 3, 1: 2, 2: 1, 3: 11, 4: 5, 5: 10, 6: 11, 7: 11}})
df_sample
Suppose the keywords in 'query' returns the two URLs in 'page' in a search engine ranked at different positions ('rank').
Each keyword has two URLs at different ranks or same 11th rank (11 means they weren't found on the first page).
I want to score the pages. My solution:
Lower the rank (lowest = 1), higher the score. Since only two URLs, we can score them 1 and 2. 11 will receive a score 1 as it means a rank>10.
Except in case if both URLs of that keyword are ranked 11, in which case we drop both rows containing that keyword.
We will need a seperate column = 'score'.
Remember, a keyword never repeats more than twice, (42 rows has only 21 keywords), but different keywords can contain same urls.
Output:
df_sample_2 = pd.DataFrame({'query': {0: 'keyword_1',
1: 'keyword_1',
2: 'keyword_2',
3: 'keyword_2',
4: 'keyword_3',
5: 'keyword_3'},
'page': {0: 'google.com',
1: 'apple.com',
2: 'google.com',
3: 'apple.com',
4: 'google.com',
5: 'apple.com'},
'rank': {0: 3, 1: 2, 2: 1, 3: 11, 4: 5, 5: 10}, 'score': {0: 1, 1: 2, 2: 2, 3: 1, 4: 2, 5: 1}})
df_sample_2
CodePudding user response:
You need two lines of codes (and both of these lines can be interchanged)
grouped_df = df_sample.groupby('query')
# To remove group with same rank
new_df = grouped_df.filter(lambda x: x['rank'].nunique() >1)
query page rank
0 keyword_1 google.com 3
1 keyword_1 apple.com 2
2 keyword_2 google.com 1
3 keyword_2 apple.com 11
4 keyword_3 google.com 5
5 keyword_3 apple.com 10
# To rank the dataframe in descending order by group
new_df['score'] = grouped_df['rank'].rank(ascending=False)
query page rank score
0 keyword_1 google.com 3 1.0
1 keyword_1 apple.com 2 2.0
2 keyword_2 google.com 1 2.0
3 keyword_2 apple.com 11 1.0
4 keyword_3 google.com 5 2.0
5 keyword_3 apple.com 10 1.0