Home > front end >  How to score different grouped pandas row values as 1 and 2 in a separate column through conditions?
How to score different grouped pandas row values as 1 and 2 in a separate column through conditions?

Time:09-17

Example dataframe:

enter image description here

df_sample = pd.DataFrame({'query': {0: 'keyword_1',
  1: 'keyword_1',
  2: 'keyword_2',
  3: 'keyword_2',
  4: 'keyword_3',
  5: 'keyword_3',
  6: 'keyword_4',
  7: 'keyword_4'},
 'page': {0: 'google.com',
  1: 'apple.com',
  2: 'google.com',
  3: 'apple.com',
  4: 'google.com',
  5: 'apple.com',
  6: 'papaya.com',
  7: 'foobaar.com'},
 'rank': {0: 3, 1: 2, 2: 1, 3: 11, 4: 5, 5: 10, 6: 11, 7: 11}})

df_sample

Suppose the keywords in 'query' returns the two URLs in 'page' in a search engine ranked at different positions ('rank').

Each keyword has two URLs at different ranks or same 11th rank (11 means they weren't found on the first page).

I want to score the pages. My solution:

Lower the rank (lowest = 1), higher the score. Since only two URLs, we can score them 1 and 2. 11 will receive a score 1 as it means a rank>10.

Except in case if both URLs of that keyword are ranked 11, in which case we drop both rows containing that keyword.

We will need a seperate column = 'score'.

Remember, a keyword never repeats more than twice, (42 rows has only 21 keywords), but different keywords can contain same urls.

Output:

enter image description here

df_sample_2 = pd.DataFrame({'query': {0: 'keyword_1',
  1: 'keyword_1',
  2: 'keyword_2',
  3: 'keyword_2',
  4: 'keyword_3',
  5: 'keyword_3'},
 'page': {0: 'google.com',
  1: 'apple.com',
  2: 'google.com',
  3: 'apple.com',
  4: 'google.com',
  5: 'apple.com'},
 'rank': {0: 3, 1: 2, 2: 1, 3: 11, 4: 5, 5: 10}, 'score': {0: 1, 1: 2, 2: 2, 3: 1, 4: 2, 5: 1}})

df_sample_2

CodePudding user response:

You need two lines of codes (and both of these lines can be interchanged)

grouped_df = df_sample.groupby('query')

# To remove group with same rank
new_df = grouped_df.filter(lambda x: x['rank'].nunique() >1)

       query        page  rank
0  keyword_1  google.com     3
1  keyword_1   apple.com     2
2  keyword_2  google.com     1
3  keyword_2   apple.com    11
4  keyword_3  google.com     5
5  keyword_3   apple.com    10

# To rank the dataframe in descending order by group
new_df['score'] = grouped_df['rank'].rank(ascending=False)

       query        page  rank  score
0  keyword_1  google.com     3    1.0
1  keyword_1   apple.com     2    2.0
2  keyword_2  google.com     1    2.0
3  keyword_2   apple.com    11    1.0
4  keyword_3  google.com     5    2.0
5  keyword_3   apple.com    10    1.0
  • Related