I have the following list
['BILL FROM:', 'TestCorp', 'USA, NY 02123, Washington St.', 'New York, NY, 02123', 'United States', '[email protected]', 'BILL TO:', 'Yao Min', 'LA, 2123, Los Angeles', 'new York, NY, 9803432', 'United States', '[email protected]', 'INVOICE #', '01', 'INVOICE DATE', '2022-08-05', 'AMOUNT DUE', '$36.79', 'SUBTOTAL', '$36.74', 'TAX (0.13%)', '$0.05', 'TOTAL', '$36.79', 'Item', 'Description', 'Quantity', 'Unit Cost', 'Line Total', 'Pen', 'Pen, black coloe', '1.5', '$1.16', '$1.74', 'Book', 'Calculus, Zorych, pt. 1', '3.5', '$10.00', '$35.00', 'Powered by Shopify', 'www.shopify.com']
And i need to generate a dictionary from it, so that my dictionary could be like that:
{
'items': [
{
'item': 'Pen',
'description': 'Pen, black coloe',
'quantity': '1.5',
'lineCost': '$1.16',
'lineTotaal': '$1.74'
},
{
'item': 'Book',
'description': 'Calculus, Zorych, pt. 1',
'quantity': '3.5',
'lineCost': '$10.00',
'lineTotaal': '$35.00'
}
]
}
I tried to loop through the list and append needed data to separate dict but it gave me duplicate values:
for i in range(len(text['blocks'])):
for j in range(len(text['blocks'][i]['lines'])):
for k in range(len(text['blocks'][i]['lines'][j]['spans'])):
prepared_data.append(text['blocks'][i]['lines'][j]['spans'][k]['text'])
cleaned_json_keys = text['blocks'][i]['lines'][0]['spans'][0]['text']
cleaned_json_vals = text['blocks'][i]['lines'][j]['spans'][k]['text']
cleaned_json[cleaned_json_keys] = cleaned_json_vals
Output:
{'BILL FROM:': 'BILL FROM:', 'TestCorp': 'TestCorp', 'USA, NY 02123, Washington St.': 'USA, NY 02123, Washington St.', 'New York, NY, 02123': 'New York, NY, 02123', 'United States': 'United States', '[email protected]': '[email protected]', 'BILL TO:': 'BILL TO:', 'Yao Min': 'Yao Min', 'LA, 2123, Los Angeles': 'LA, 2123, Los Angeles', 'new York, NY, 9803432': 'new York, NY, 9803432', '[email protected]': '[email protected]', 'INVOICE #': '01', 'INVOICE DATE': '2022-08-05', 'AMOUNT DUE': '$36.79', 'SUBTOTAL': '$36.74', 'TAX (0.13%)': '$0.05', 'TOTAL': '$36.79', 'Item': 'Line Total', 'Pen': '$1.74', 'Book': '$35.00', 'Powered by Shopify': 'Powered by Shopify', 'www.shopify.com': 'www.shopify.com'}
How could I do that? I somebody knows the way, help me please... Thank you in advance!
CodePudding user response:
I will help you a bit: you need to parse that list to extract the relevant data. The list could vary in length, but provided that the items are always located between 'Line Total' and 'Powered by Shopify' then you can perform slicing after locating the data boundaries:
>>> my_list.index("Line Total")
28
>>> my_list.index("Powered by Shopify")
39
items = my_list[29:39]
>>> items
['Pen', 'Pen, black coloe', '1.5', '$1.16', '$1.74', 'Book', 'Calculus, Zorych, pt. 1', '3.5', '$10.00', '$35.00']
The length of the resulting list should be a multiple of five in your case. Then, before you start looping on that I suggest you split that list into sublists of 5 items each. Have a look here to see how it could be done. This function borrowed from the above quoted post will do the job just fine:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i n]
Result (applying list
because what is returned is a generator):
>>> list(chunks(items, 5))
[['Pen', 'Pen, black coloe', '1.5', '$1.16', '$1.74'], ['Book', 'Calculus, Zorych, pt. 1', '3.5', '$10.00', '$35.00']]
CodePudding user response:
For example
f = ['BILL FROM:', 'TestCorp', 'USA, NY 02123, Washington St.', 'New York, NY, 02123', 'United States', '[email protected]', 'BILL TO:', 'Yao Min', 'LA, 2123, Los Angeles', 'new York, NY, 9803432', 'United States', '[email protected]', 'INVOICE #', '01', 'INVOICE DATE', '2022-08-05', 'AMOUNT DUE', '$36.79', 'SUBTOTAL', '$36.74', 'TAX (0.13%)', '$0.05', 'TOTAL', '$36.79', 'Item', 'Description', 'Quantity', 'Unit Cost', 'Line Total', 'Pen', 'Pen, black coloe', '1.5', '$1.16', '$1.74', 'Book', 'Calculus, Zorych, pt. 1', '3.5', '$10.00', '$35.00', 'Powered by Shopify', 'www.shopify.com']
dicto = dict(zip(f[24:29], f[30:35]))
print(str(dicto))
identify wich list items are keys, wich are values,
for example :
f[0] = 'BILLS FROM' -> key
f[1:5] = ... ... .... ... -> value
Build a table of key, and a table of values,
use method dict(zip(KEY_TABLE, VALUE_TABLE)
Edit for your last comment
f = ['BILL FROM:', 'TestCorp', 'USA, NY 02123, Washington St.', 'New York, NY, 02123', 'United States', '[email protected]', 'BILL TO:', 'Yao Min', 'LA, 2123, Los Angeles', 'new York, NY, 9803432', 'United States', '[email protected]', 'INVOICE #', '01', 'INVOICE DATE', '2022-08-05', 'AMOUNT DUE', '$36.79', 'SUBTOTAL', '$36.74', 'TAX (0.13%)', '$0.05', 'TOTAL', '$36.79', 'Item', 'Description', 'Quantity', 'Unit Cost', 'Line Total', 'Pen', 'Pen, black coloe', '1.5', '$1.16', '$1.74', 'Book', 'Calculus, Zorych, pt. 1', '3.5', '$10.00', '$35.00', 'Powered by Shopify', 'www.shopify.com']
dicto = dict(zip(f[24:29], f[30:35]))
print(str(dicto))
key = ['BILL FROM', "BILL TO", 'INVOICE', 'INVOICE DATE', 'AMOUNT DUE', 'SUBTOTAL', 'TOTAL', 'ITEM', 'DESCRIPTION', 'QUANTITY', 'UNIT COST', 'LINE TOTAL']
value = []
len_f = len(f)
i = 0
for elem in f:
if elem in key:
f.remove(elem)
continue
bill_from_size = 5
val = ''
for elem in f:
if i < bill_from_size:
val = val elem
i = i 1
if i == 5:
value.append(val)
print(key)
print(val)
then you go on for every key you need, here i concatenated all the field that correspond to key BILL_FROM, once you manage to fill the value list for each key you can use the dict(zip(key, value))
method