I have this code which works fine and is minimal and reproducible. It uses lists
and tuples
. Given the slowness of lists and tuples on large amounts of data, i would like to change the whole setting and use dictionaries
to speed up performance.
So I'd like to convert this block of queues into something similar that uses dictionaries.
The purpose of the code is to create the variables x
and y
(calculation of mathematical data) and add them to a list, using an append and tuples. I then mine the numbers for certain purposes.
How can I add dictionaries
where needed and replace them with list/append
codes? Thank you!
VERSION WITH TUPLE AND LIST
mylist = {('Jack', 'Grace', 8, 9, '15:00'): [0, 1, 1, 5],
('William', 'Dawson', 8, 9, '18:00'): [1, 2, 3, 4],
('Natasha', 'Jonson', 8, 9, '20:45'): [0, 1, 1, 2]}
new = []
for key, value in mylist.items():
#create variables and perform calculations
calc_x= sum(value)/ len(value)
calc_y = (calc_x *100) / 2
#create list with 3 tuples inside
if calc_x > 0.1:
new.append([[key], [calc_x], [calc_y]])
print(new)
print(" ")
#example for call calc_x
print_x = [tuple(i[1]) for i in new]
print(print_x)
I was trying to write something like this, but I don't think it fits, so don't even look at it.I have two requests if possible:
- I would like
sum(value)/ len(value)
and(calc_x *100) / 2
to continue to have their own variablescalc_x
andcalc_y
, so that they can invoke individually in the append as you can see - In the
new
variable, i would like to be able to call the variables when i are needed, such asfor example i do for print_x = [tuple(i[1]) for i in new]
. Thank you
CodePudding user response:
new_dict = {}
for key, value in mylist.items():
#create variables and perform calculations
calc_x= sum(value)/ len(value)
calc_y = (calc_x *100) / 2
#add values to dictionary
if calc_x > 0.1:
new_dict[key] = [calc_x, calc_y]
print(new_dict)
#example for call calc_x
print_x = {k:v[0] for k,v in new_dict.items()}
print(print_x)
CodePudding user response:
If you really want to improve performance, you can use Pandas
(or Numpy
) to vectorize math operations:
import pandas as pd
# Transform your dataset to DataFrame
df = pd.DataFrame.from_dict(mylist, orient='index')
# Compute some operations
df['x'] = df.mean(axis=1)
df['y'] = df['x'] * 50
# Filter out and export
out = df.loc[df['x'] > 0.1, ['x', 'y']].to_dict('split')
new = dict(zip(out['index'], out['data']))
Output:
>>> new
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
A numpy version:
import numpy as np
# transform keys to numpy array (special hack to keep tuples)
keys = np.empty(len(mylist), dtype=object)
keys[:] = tuple(mylist.keys())
# transform values to numpy array
vals = np.array(tuple(mylist.values()))
x = np.mean(vals, axis=1)
y = x * 50
# boolean mask to exclude some values
m = x > 0.1
out = np.vstack([x, y]).T
new = dict(zip(keys[m].tolist(), out[m].tolist()))
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}
A python version:
new = {}
for k, v in mylist.items():
x = sum(v) / len(v)
y = x * 50
if x > 0.1:
new[k] = [x, y]
print(new)
# Output
{('Jack', 'Grace', 8, 9, '15:00'): [1.75, 87.5],
('William', 'Dawson', 8, 9, '18:00'): [2.5, 125.0],
('Natasha', 'Jonson', 8, 9, '20:45'): [1.0, 50.0]}