Home > Software design >  Divide a selected item of a list by another column in a DataFrame and chose top results
Divide a selected item of a list by another column in a DataFrame and chose top results

Time:08-02

I have the following test DateFrame:

tag list Count
icecream [['A',0.9],['B',0.6],['C',0.5],['D',0.3],['E',0.1]] 5
potato [['U',0.8],['V',0.7],['W',0.4],['X',0.3],['Y',0.2]] 5

Count is basically the number of lists in the DataFrame which I was able to get and add as a new column. I want to divide the score of the element in the list by the value in the 'Count' column. The result should be like this:

tag list Count
icecream [['A',0.18],['B',0.12],['C',0.1],['D',0.06],['E',0.02]] 5
potato [['U',0.16],['V',0.14],['W',0.08],['X',0.06],['Y',0.04]] 5

How can I divide only the second element in the list with the count value.

I know if divide the list column by count column it wont work because one of the element is a string.

data = [['icecream', [['A', 0.9],['B', 0.6],['C',0.5],['D',0.3],['E',0.1]]], 
        ['potato', [['U', 0.8],['V', 0.7],['W',0.4],['X',0.3],['Y',0.2]]]]

test = pd.DataFrame(data, columns=['tag', 'list'])
test['Count'] = test['list'].str.len().sort_values( ascending=[False])
test

test['list'].div(test['Count'])

gives an error which is expected: 
TypeError: unsupported operand type(s) for /: 'list' and 'int'

In next step, I want to only include the list whose value is in the first 10 percentile of the members. Let say, its like this:

tag list
icecream [['A',0.18],['B',0.12],['C',0.1]]
potato [['U',0.16],['V',0.14]]

CodePudding user response:

Pandas cannot handle lists in a vectorial way. You have no choice here but to loop. The fastest will be a list comprehension:

test['list'] = [[[a, b/len(l)] for a,b in l]
                for l in test['list']]

Or, for in place modification a simple classical loop:

for l in test['list']:
    for x in l:
        x[1] /= len(l)

NB. You do not need the "Count" column.

Output:

        tag                                               list
0  icecream  [[A, 0.18], [B, 0.12], [C, 0.1], [D, 0.06], [E...
1    potato  [[U, 0.16], [V, 0.14], [W, 0.08], [X, 0.06], [...
  • Related