I am using the unleashed_py library to extract Unleashed data.
The sample of the output is as below where there could be several items in the invoice:
[{
'OrderNumber': 'SO-00000742',
'QuoteNumber': None,
'InvoiceDate': '/Date(1658496322067)/',
'InvoiceLines': [{'LineNumber': 1,
'LineType': None},
{'LineNumber': 2,
'LineType': None}],
'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
'LastModifiedOn': '/Date(1658496322221)/'}]
I need to get a df:
If I run the below script, the invoice lines just get appended with the common fields such as ordernumber, quotenumber, invoicedate, guide, and lastmodifiedon not getting repeated.
order_number = []
quote_number = []
invoice_date = []
invoicelines = []
invoice_line_number = []
invoice_line_type = []
guid = []
last_modified = []
for item in df:
order_number.append(item.get('OrderNumber'))
quote_number.append(item.get('QuoteNumber'))
invoice_date.append(item.get('InvoiceDate'))
guid.append(item.get('Guid'))
last_modified.append(item.get('LastModifiedOn'))
lines = item.get('InvoiceLines')
for item_sub_2 in lines:
invoice_line_number.append('LineNumber')
invoice_line_type.append('LineType')
df_order_number = pd.DataFrame(order_number)
df_quote_number = pd.DataFrame(quote_number)
df_invoice_date = pd.DataFrame(invoice_date)
df_invoice_line_number = pd.DataFrame(invoice_line_number)
df_invoice_line_type = pd.DataFrame(invoice_line_type)
df_guid = pd.DataFrame(guid)
df_last_modified = pd.DataFrame(last_modified)
df_row = pd.concat([
df_order_number,
df_quote_number,
df_invoice_date,
df_invoice_line_number,
df_invoice_line_type,
df_guid,
df_last_modified
], axis = 1)
What am I doing wrong?
CodePudding user response:
You don't need to iterate, just create the dataframe out of the list of dictionaries you have, then explode InvoiceLines
columns then apply pd.Series
and join it with the original dataframe:
data = [{
'OrderNumber': 'SO-00000742',
'QuoteNumber': None,
'InvoiceDate': '/Date(1658496322067)/',
'InvoiceLines': [{'LineNumber': 1,
'LineType': None},
{'LineNumber': 2,
'LineType': None}],
'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
'LastModifiedOn': '/Date(1658496322221)/'}]
df=pd.DataFrame(data).explode('InvoiceLines')
out=pd.concat([df['InvoiceLines'].apply(pd.Series),
df.drop(columns=['InvoiceLines'])],
axis=1)
OUTPUT:
#out
LineNumber LineType OrderNumber QuoteNumber InvoiceDate \
0 1.0 NaN SO-00000742 None /Date(1658496322067)/
0 2.0 NaN SO-00000742 None /Date(1658496322067)/
Guid LastModifiedOn
0 8f6b89da-1e6e-42288a24-902a-038041e04f06 /Date(1658496322221)/
0 8f6b89da-1e6e-42288a24-902a-038041e04f06 /Date(1658496322221)/
I'm leaving the date conversion and column renames for you cause I believe you can do that yourself.