How to replace list/tuple with dictionaries, in working code, to improve its performance?-CodePudding

I have this code which works fine and is minimal and reproducible. It uses lists and tuples. Given the slowness of lists and tuples on large amounts of data, i would like to change the whole setting and use dictionaries to speed up performance.

So I'd like to convert this block of queues into something similar that uses dictionaries.

The purpose of the code is to create the variables x and y (calculation of mathematical data) and add them to a list, using an append and tuples. I then mine the numbers for certain purposes.

How can I add dictionaries where needed and replace them with list/append codes? Thank you!

VERSION WITH TUPLE AND LIST

mylist = {('Jack', 'Grace', 8, 9, '15:00'): [0, 1, 1, 5], 
         ('William', 'Dawson', 8, 9, '18:00'): [1, 2, 3, 4], 
         ('Natasha', 'Jonson', 8, 9, '20:45'): [0, 1, 1, 2]}

new = []

for key, value in mylist.items():

    #create variables and perform calculations
    calc_x= sum(value)/ len(value)
    calc_y = (calc_x *100) / 2

    #create list with 3 tuples inside
    if calc_x > 0.1:
        new.append([[key], [calc_x], [calc_y]])

print(new)
print(" ")

#example for call calc_x
print_x = [tuple(i[1]) for i in new]
print(print_x)

I was trying to write something like this, but I don't think it fits, so don't even look at it.I have two requests if possible:

I would like sum(value)/ len(value) and (calc_x *100) / 2 to continue to have their own variables calc_x and calc_y, so that they can invoke individually in the append as you can see
In the new variable, i would like to be able to call the variables when i are needed, such as for example i do for print_x = [tuple(i[1]) for i in new]. Thank you

CodePudding user response：

new_dict = {}

for key, value in mylist.items():
    #create variables and perform calculations
    calc_x= sum(value)/ len(value)
    calc_y = (calc_x *100) / 2

    #add values to dictionary
    if calc_x > 0.1:
        new_dict[key] = [calc_x, calc_y]

print(new_dict)

#example for call calc_x
print_x = {k:v[0] for k,v in new_dict.items()}
print(print_x)

CodePudding user response：

If you really want to improve performance, you can use Pandas (or Numpy) to vectorize math operations:

import pandas as pd

# Transform your dataset to DataFrame
df = pd.DataFrame.from_dict(mylist, orient='index')

# Compute some operations
df['x'] = df.mean(axis=1)
df['y'] = df['x'] * 50

# Filter out and export
out = df.loc[df['x'] > 0.1, ['x', 'y']].to_dict('split')
new = dict(zip(out['index'], out['data']))

Output:

>>> new
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
 ('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
 ('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}

A numpy version:

import numpy as np

# transform keys to numpy array (special hack to keep tuples)
keys = np.empty(len(mylist), dtype=object)
keys[:] = tuple(mylist.keys())

# transform values to numpy array
vals = np.array(tuple(mylist.values()))

x = np.mean(vals, axis=1)
y = x * 50

# boolean mask to exclude some values
m = x > 0.1

out = np.vstack([x, y]).T
new = dict(zip(keys[m].tolist(), out[m].tolist()))
print(new)

# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
 ('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
 ('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}

A python version:

new = {}
for k, v in mylist.items():
    x = sum(v) / len(v)
    y = x * 50
    if x > 0.1:
        new[k] = [x, y]
print(new)

# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
 ('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
 ('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}