Python - Read .csv as dictionary with multiple keys & store matching key values as nested List of Li-CodePudding

I need guidance on the best approach to loop through a .csv file, convert to a dictionary, and store a list of lists result for each matching key (3 Keys in total), finally storing those into a master List vector to later act as an iterable input to a function.

Using pandas to read in a long .csv as below:
mach,elem,cycle,Qty's,Sizes
x,y,1,12,68
x,y,1,24,72
x,y,1,32,76
...
(x,y,1) = Keys
(12,68) = values for each matching set of Keys

I need to store ['x','y',1] as a Key value in a dictionary, with (12,68) being the value. All matching Keys from cols 0:3 should then store the values as a List of Lists; so from above:

[x,y,1] = ((12, 68),(24,72),(32,76)
[x,y,2] = ...etc

Then, store these into a master List to later act as an input to a function:

master = [[x,y,1], [x,y,2],...]

I have many combinations of Key values; so the end list will be large. I can manually set this up pandas; doing something like (y not included; quick example):

x_3_vals = x_3.filter(items=["Qty's", "Sizes"])
x_3_vals = x_3_vals.values.tolist()

x_2_vals = x_2.filter(items=["Qty's", "Sizes"])
x_2_vals = x_2_vals.values.tolist()
...

mster = [x_1_vals, x_2_vals, x_3_vals]

but I need a way to loop through the long .csv file and store / create the List of lists automatically.

Column headers will not change in the initial file; only values / keys.

CodePudding user response：

Ok, I answered my own question - no need for a dictionary, that's overcomplicating things. For anyone searching for a similar issue:

Read in the .csv, create a "Key" frame; then loop through the original frame using the Key as a filter, saving the two needed columns as a List of lists in the process, then storing them in a master list to be used later.

The Key frame & master list are arranged in synchronous order; so there's no need for a dictionary unless you have to pull part of the list and you're not sure which index the key is in.

There's probably a much faster method; but for clarity sake this works:

df = pd.read_csv('Inputs\PM_cycle_test.csv', skip_blank_lines=True, index_col=False)
df['Key'] = df['PM']   df['Grade']   df['Cycle'].astype(str)

df_keys = df.filter(items=["Key"]).drop_duplicates().values.tolist()


mstr = []
for idx, val in enumerate(df_keys):
    for _ in enumerate(df):
        df_filter = df.loc[df['Key'].isin(val)].filter(items=["Qty's", "Sizes"])
        tmp = df_filter.values.tolist()
    mstr.append(tmp)
    rprint(f'{df_keys[idx]}-> {mstr[idx]}')


#rprint(df_keys)
rprint(f'{len(mstr)} Items;')