I need guidance on the best approach to loop through a .csv file, convert to a dictionary, and store a list of lists result for each matching key (3 Keys in total), finally storing those into a master List vector to later act as an iterable input to a function.
Using pandas to read in a long .csv as below:
mach,elem,cycle,Qty's,Sizes
x,y,1,12,68
x,y,1,24,72
x,y,1,32,76
...
(x,y,1) = Keys
(12,68) = values for each matching set of Keys
I need to store ['x','y',1] as a Key value in a dictionary, with (12,68) being the value. All matching Keys from cols 0:3 should then store the values as a List of Lists; so from above:
[x,y,1] = ((12, 68),(24,72),(32,76)
[x,y,2] = ...etc
Then, store these into a master List to later act as an input to a function:
master = [[x,y,1], [x,y,2],...]
I have many combinations of Key values; so the end list will be large. I can manually set this up pandas; doing something like (y not included; quick example):
x_3_vals = x_3.filter(items=["Qty's", "Sizes"])
x_3_vals = x_3_vals.values.tolist()
x_2_vals = x_2.filter(items=["Qty's", "Sizes"])
x_2_vals = x_2_vals.values.tolist()
...
mster = [x_1_vals, x_2_vals, x_3_vals]
but I need a way to loop through the long .csv file and store / create the List of lists automatically.
Column headers will not change in the initial file; only values / keys.
CodePudding user response:
Ok, I answered my own question - no need for a dictionary, that's overcomplicating things. For anyone searching for a similar issue:
Read in the .csv, create a "Key" frame; then loop through the original frame using the Key as a filter, saving the two needed columns as a List of lists in the process, then storing them in a master list to be used later.
The Key frame & master list are arranged in synchronous order; so there's no need for a dictionary unless you have to pull part of the list and you're not sure which index the key is in.
There's probably a much faster method; but for clarity sake this works:
df = pd.read_csv('Inputs\PM_cycle_test.csv', skip_blank_lines=True, index_col=False)
df['Key'] = df['PM'] df['Grade'] df['Cycle'].astype(str)
df_keys = df.filter(items=["Key"]).drop_duplicates().values.tolist()
mstr = []
for idx, val in enumerate(df_keys):
for _ in enumerate(df):
df_filter = df.loc[df['Key'].isin(val)].filter(items=["Qty's", "Sizes"])
tmp = df_filter.values.tolist()
mstr.append(tmp)
rprint(f'{df_keys[idx]}-> {mstr[idx]}')
#rprint(df_keys)
rprint(f'{len(mstr)} Items;')