I have the following test DateFrame:
tag | list | Count |
---|---|---|
icecream | [['A',0.9],['B',0.6],['C',0.5],['D',0.3],['E',0.1]] | 5 |
potato | [['U',0.8],['V',0.7],['W',0.4],['X',0.3],['Y',0.2]] | 5 |
Count is basically the number of lists in the DataFrame which I was able to get and add as a new column. I want to divide the score of the element in the list by the value in the 'Count' column. The result should be like this:
tag | list | Count |
---|---|---|
icecream | [['A',0.18],['B',0.12],['C',0.1],['D',0.06],['E',0.02]] | 5 |
potato | [['U',0.16],['V',0.14],['W',0.08],['X',0.06],['Y',0.04]] | 5 |
How can I divide only the second element in the list with the count value.
I know if divide the list column by count column it wont work because one of the element is a string.
data = [['icecream', [['A', 0.9],['B', 0.6],['C',0.5],['D',0.3],['E',0.1]]],
['potato', [['U', 0.8],['V', 0.7],['W',0.4],['X',0.3],['Y',0.2]]]]
test = pd.DataFrame(data, columns=['tag', 'list'])
test['Count'] = test['list'].str.len().sort_values( ascending=[False])
test
test['list'].div(test['Count'])
gives an error which is expected:
TypeError: unsupported operand type(s) for /: 'list' and 'int'
In next step, I want to only include the list whose value is in the first 10 percentile of the members. Let say, its like this:
tag | list |
---|---|
icecream | [['A',0.18],['B',0.12],['C',0.1]] |
potato | [['U',0.16],['V',0.14]] |
CodePudding user response:
Pandas cannot handle lists in a vectorial way. You have no choice here but to loop. The fastest will be a list comprehension:
test['list'] = [[[a, b/len(l)] for a,b in l]
for l in test['list']]
Or, for in place modification a simple classical loop:
for l in test['list']:
for x in l:
x[1] /= len(l)
NB. You do not need the "Count" column.
Output:
tag list
0 icecream [[A, 0.18], [B, 0.12], [C, 0.1], [D, 0.06], [E...
1 potato [[U, 0.16], [V, 0.14], [W, 0.08], [X, 0.06], [...