I have a dictionary that looks like this, with values being lists of same number of indexes. It is to construct a panda dataframe. I want to get name of the key with the maximum value of each index in these lists. (i.e. 0.00023478 for the first index of these lists and 0.23849287 for the fourth). I tried to convert it into a panda dataframe and then find the max index, but that takes too much time as I am processing too much data. I would need to find the max value of a specific index and then return the key before converting the dictionary to a dataframe.
{'DT': [0, 0, 0, 0, 0, 0, 0, 0], 'NN': [0.00023478, 0, 0, 0, 0, 0, 0, 0],
'POS': [0, 0, 0, 0.000192837, 0, 0, 0, 0], 'MD': [0, 0, 0, 0, 0, 0, 0, 0],
'VB': [0, 0, 0, 0, 0, 0, 0, 0], 'VBN': [0, 0, 0, 0, 0, 0, 0, 0],
'IN': [0.0000028945, 0, 0, 0, 0, 0, 0, 0], 'JJ': [0, 0, 0, 0, 0, 0, 0, 0],
'NNS': [0, 0, 0, 0, 0, 0, 0, 0], 'CC': [0, 0, 0, 0.23849287, 0, 0, 0, 0],
'RBS': [0, 0, 0, 0, 0, 0, 0, 0], 'NNP': [0, 0, 0, 0, 0, 0, 0, 0],
'VBZ': [0, 0, 0, 0, 0, 0, 0, 0], 'TO': [0, 0, 0, 0, 0, 0, 0, 0]}
for i in range(len(test)): # how many sentence
list1 = [[0 for x in range(len(test[i]))] for y in range(len(pos_list))]
q = dict(zip(pos_list, list1))
for j in range(len(test[i])):
CodePudding user response:
Using max
with dict.get
as the key:
max(data, key=data.get)
Or with DataFrame.idxmax
:
df.idxmax(1)
CodePudding user response:
Convert your dict to a DataFrame
:
df = pd.DataFrame(d)
print(df)
# Output:
DT NN POS MD VB VBN IN JJ NNS CC RBS NNP VBZ TO
0 0 0.000235 0.000000 0 0 0 0.000003 0 0 0.000000 0 0 0 0
1 0 0.000000 0.000000 0 0 0 0.000000 0 0 0.000000 0 0 0 0
2 0 0.000000 0.000000 0 0 0 0.000000 0 0 0.000000 0 0 0 0
3 0 0.000000 0.000193 0 0 0 0.000000 0 0 0.238493 0 0 0 0
4 0 0.000000 0.000000 0 0 0 0.000000 0 0 0.000000 0 0 0 0
5 0 0.000000 0.000000 0 0 0 0.000000 0 0 0.000000 0 0 0 0
6 0 0.000000 0.000000 0 0 0 0.000000 0 0 0.000000 0 0 0 0
7 0 0.000000 0.000000 0 0 0 0.000000 0 0 0.000000 0 0 0 0
Then use max
on columns axis:
>>> df.max(axis='columns')
0 0.000235
1 0.000000
2 0.000000
3 0.238493
4 0.000000
5 0.000000
6 0.000000
7 0.000000
dtype: float64
This is the same to know the index key with idxmax
:
>>> df.idxmax(axis='columns')
0 NN
1 DT
2 DT
3 CC
4 DT
5 DT
6 DT
7 DT
dtype: object