Home > front end >  python dataframe to dictionary with multiple columns in keys and values
python dataframe to dictionary with multiple columns in keys and values

Time:02-28

I am working on an optimization problem and need to create indexing to build a mixed-integer mathematical model. I am using python dictionaries for the task. Below is a sample of my dataset. Full dataset is expected to have about 400K rows if that matters.

# sample input data
pd.DataFrame.from_dict({'origin': {0: 'perris', 1: 'perris', 2: 'perris', 3: 'perris', 4: 'perris'}, 
'dest': {0: 'alexandria', 1: 'alexandria', 2: 'alexandria', 3: 'alexandria', 4: 'alexandria'}, 
'product': {0: 'bike', 1: 'bike', 2: 'bike', 3: 'bike', 4: 'bike'}, 
'lead_time': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4}, 'build_time': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2}, 
'ship_date': {0: '02/25/2022', 1: '02/26/2022', 2: '02/27/2022', 3: '02/28/2022', 4: '03/01/2022'}, 
'ship_day': {0: 5, 1: 6, 2: 7, 3: 1, 4: 2}, 
'truck_in': {0: '03/01/2022', 1: '03/02/2022', 2: '03/03/2022', 3: '03/04/2022', 4: '03/07/2022'}, 
'product_in': {0: '03/03/2022', 1: '03/04/2022', 2: '03/05/2022', 3: '03/06/2022', 4: '03/09/2022'}})

The data frame looks like this -

enter image description here

I am looking to generate a dictionary from each row of this dataframe where the keys and values are tuples made of multiple column values. The output would look like this -

(origin, dest, product, ship_date): (origin, dest, product, truck_in)

# for example, first two rows will become a dictionary key-value pair like
{('perris', 'alexandria', 'bike', '2/25/2022'): ('perris', 'alexandria', 'bike', '3/1/2022'),
('perris', 'alexandria', 'bike', '2/26/2022'): ('perris', 'alexandria', 'bike', '3/2/2022')}

I am very new to python and couldn't figure out how to do this. Any help is appreciated. Thanks!

CodePudding user response:

You can loop through the DataFrame.

Assuming your DataFrame is called "df" this gives you the dict.

result_dict = {}
for idx, row in df.iterrows():
    result_dict[(row.origin, row.dest, row['product'], row.ship_date )] = (
                 row.origin, row.dest, row['product'], row.truck_in )

Since looping through 400k rows will take some time, have a look at tqdm (https://tqdm.github.io/) to get a progress bar with a time estimate that quickly tells you if the approach works for your dataset.

Also, note that 400K dictionary entries may take up a lot of memory so you may try to estimate if the dict fits your memory.

Another, memory waisting but faster way is to do it in Pandas

Create a new column with the value for the dictionary

df['value'] = df.apply(lambda x: (x.origin, x.dest, x['product'], x.truck_in), axis=1)

Then set the index and convert to dict

df.set_index(['origin','dest','product','ship_date'])['value'].to_dict()

CodePudding user response:

The approach below splits the initial dataframe into two dataframes that will be the source of the keys and values in the dictionary. These are then converted to arrays in order to get away from working with dataframes as soon as possible. The arrays are converted to tuples and zipped together to create the key:value pairs.

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict(
    {'origin': {0: 'perris', 1: 'perris', 2: 'perris', 3: 'perris', 4: 'perris'}, 
'dest': {0: 'alexandria', 1: 'alexandria', 2: 'alexandria', 3: 'alexandria', 4: 'alexandria'}, 
'product': {0: 'bike', 1: 'bike', 2: 'bike', 3: 'bike', 4: 'bike'}, 
'lead_time': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4}, 'build_time': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2}, 
'ship_date': {0: '02/25/2022', 1: '02/26/2022', 2: '02/27/2022', 3: '02/28/2022', 4: '03/01/2022'}, 
'ship_day': {0: 5, 1: 6, 2: 7, 3: 1, 4: 2}, 
'truck_in': {0: '03/01/2022', 1: '03/02/2022', 2: '03/03/2022', 3: '03/04/2022', 4: '03/07/2022'}, 
'product_in': {0: '03/03/2022', 1: '03/04/2022', 2: '03/05/2022', 3: '03/06/2022', 4: '03/09/2022'}}
     )
#display(df)

#desired output: (origin, dest, product, ship_date): (origin, dest, product, truck_in)

#slice df to key/value chunks
#list to array
ship = df[['origin','dest', 'product', 'ship_date']]
ship.set_index('origin', inplace = True)
keys_array=ship.to_records()
truck = df[['origin', 'dest', 'product', 'truck_in']]
truck.set_index('origin', inplace = True)
values_array = truck.to_records()

#array_of_tuples = map(tuple, an_array)
keys_map = map(tuple, keys_array)
values_map = map(tuple, values_array)

#tuple_of_tuples = tuple(array_of_tuples)
keys_tuple = tuple(keys_map)
values_tuple = tuple(values_map)

zipp = zip(keys_tuple, values_tuple)
dict2 = dict(zipp)
print(dict2)
  • Related