Home > database >  Maximum value in a specific index of lists within a dictionary
Maximum value in a specific index of lists within a dictionary

Time:10-13

I have a dictionary that looks like this, with values being lists of same number of indexes. It is to construct a panda dataframe. I want to get name of the key with the maximum value of each index in these lists. (i.e. 0.00023478 for the first index of these lists and 0.23849287 for the fourth). I tried to convert it into a panda dataframe and then find the max index, but that takes too much time as I am processing too much data. I would need to find the max value of a specific index and then return the key before converting the dictionary to a dataframe.

{'DT': [0, 0, 0, 0, 0, 0, 0, 0], 'NN': [0.00023478, 0, 0, 0, 0, 0, 0, 0], 
'POS': [0, 0, 0, 0.000192837, 0, 0, 0, 0], 'MD': [0, 0, 0, 0, 0, 0, 0, 0], 
'VB': [0, 0, 0, 0, 0, 0, 0, 0], 'VBN': [0, 0, 0, 0, 0, 0, 0, 0], 
'IN': [0.0000028945, 0, 0, 0, 0, 0, 0, 0], 'JJ': [0, 0, 0, 0, 0, 0, 0, 0], 
'NNS': [0, 0, 0, 0, 0, 0, 0, 0], 'CC': [0, 0, 0, 0.23849287, 0, 0, 0, 0], 
'RBS': [0, 0, 0, 0, 0, 0, 0, 0], 'NNP': [0, 0, 0, 0, 0, 0, 0, 0], 
'VBZ': [0, 0, 0, 0, 0, 0, 0, 0], 'TO': [0, 0, 0, 0, 0, 0, 0, 0]}
for i in range(len(test)):  # how many sentence
    list1 = [[0 for x in range(len(test[i]))] for y in range(len(pos_list))]
    q = dict(zip(pos_list, list1))
    for j in range(len(test[i])):

CodePudding user response:

Using max with dict.get as the key:

max(data, key=data.get)

Or with DataFrame.idxmax:

df.idxmax(1)

CodePudding user response:

Convert your dict to a DataFrame:

df = pd.DataFrame(d)
print(df)

# Output:
   DT        NN       POS  MD  VB  VBN        IN  JJ  NNS        CC  RBS  NNP  VBZ  TO
0   0  0.000235  0.000000   0   0    0  0.000003   0    0  0.000000    0    0    0   0
1   0  0.000000  0.000000   0   0    0  0.000000   0    0  0.000000    0    0    0   0
2   0  0.000000  0.000000   0   0    0  0.000000   0    0  0.000000    0    0    0   0
3   0  0.000000  0.000193   0   0    0  0.000000   0    0  0.238493    0    0    0   0
4   0  0.000000  0.000000   0   0    0  0.000000   0    0  0.000000    0    0    0   0
5   0  0.000000  0.000000   0   0    0  0.000000   0    0  0.000000    0    0    0   0
6   0  0.000000  0.000000   0   0    0  0.000000   0    0  0.000000    0    0    0   0
7   0  0.000000  0.000000   0   0    0  0.000000   0    0  0.000000    0    0    0   0

Then use max on columns axis:

>>> df.max(axis='columns')
0    0.000235
1    0.000000
2    0.000000
3    0.238493
4    0.000000
5    0.000000
6    0.000000
7    0.000000
dtype: float64

This is the same to know the index key with idxmax:

>>> df.idxmax(axis='columns')
0    NN
1    DT
2    DT
3    CC
4    DT
5    DT
6    DT
7    DT
dtype: object
  • Related