How to rank a url in one column using a list of urls in another column in Pandas?-CodePudding

My data frame looks something like this with URLs instead of letters:

.csv code:

query,ranks
a,"[k, g, y, l, a]"
h,"[f, g, l, h, p]"
x,"[b, x, y, a, g]"
w,"[w, I, b, d, g]"
r,"[I, r, n, f, g]"

I want the outcome to be like this:

.csv code:

query,ranks,rank
a,"[k, g, y, l, a]",5
h,"[f, g, l, h, p]",4
x,"[b, x, y, a, g]",2
w,"[w, I, b, d, g]",1
r,"[I, r, n, f, g]",2

As you can see, each letter (URL) has been ranked according to its position.

Edit: Sometimes the 'ranks' value (dtype: list, of strings) doesn't have the 'query' value.

CodePudding user response：

A basic solution with apply and accounting for possibly missing values (I set -1 as default value but you can set whatever you need):

df = pd.DataFrame({'query': ['a', 'h', 'x', 'w', 'r'],
                   'ranks': [['k', 'g', 'y', 'l', 'a'],
                             ['f', 'g', 'l', 'h', 'p'],
                             ['b', 'x', 'y', 'a', 'g'],
                             ['w', 'I', 'b', 'd', 'g'],
                             ['I', 'r', 'n', 'f', 'g']]})

>>> df["rank"] = df.apply(lambda row: next((i for i,rank in enumerate(row.ranks, start=1) if rank == row.query), -1), axis=1)
>>> df
  query            ranks  rank
0     a  [k, g, y, l, a]     5
1     h  [f, g, l, h, p]     4
2     x  [b, x, y, a, g]     2
3     w  [w, I, b, d, g]     1
4     r  [I, r, n, f, g]     2