Home > Software engineering >  Extract data from array - Python
Extract data from array - Python

Time:08-20

I am using the unleashed_py library to extract Unleashed data.

The sample of the output is as below where there could be several items in the invoice:

[{
 'OrderNumber': 'SO-00000742',
  'QuoteNumber': None,
  'InvoiceDate': '/Date(1658496322067)/',
  'InvoiceLines': [{'LineNumber': 1,
    'LineType': None},
   {'LineNumber': 2,
    'LineType': None}],
  'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
  'LastModifiedOn': '/Date(1658496322221)/'}]

I need to get a df:

enter image description here

If I run the below script, the invoice lines just get appended with the common fields such as ordernumber, quotenumber, invoicedate, guide, and lastmodifiedon not getting repeated.

order_number = []
quote_number = []
invoice_date = []
invoicelines = []
invoice_line_number = []
invoice_line_type = []
guid = []
last_modified = []


for item in df:
    order_number.append(item.get('OrderNumber'))
    quote_number.append(item.get('QuoteNumber'))
    invoice_date.append(item.get('InvoiceDate'))
    guid.append(item.get('Guid'))
    last_modified.append(item.get('LastModifiedOn'))
    lines = item.get('InvoiceLines')
    for item_sub_2 in lines:
        invoice_line_number.append('LineNumber')
        invoice_line_type.append('LineType')

df_order_number = pd.DataFrame(order_number)
df_quote_number = pd.DataFrame(quote_number)
df_invoice_date = pd.DataFrame(invoice_date)
df_invoice_line_number  = pd.DataFrame(invoice_line_number)
df_invoice_line_type = pd.DataFrame(invoice_line_type)
df_guid = pd.DataFrame(guid)
df_last_modified = pd.DataFrame(last_modified)

df_row = pd.concat([
df_order_number,
df_quote_number,
df_invoice_date,
df_invoice_line_number,
df_invoice_line_type,
df_guid,
df_last_modified
], axis = 1)

What am I doing wrong?

CodePudding user response:

You don't need to iterate, just create the dataframe out of the list of dictionaries you have, then explode InvoiceLines columns then apply pd.Series and join it with the original dataframe:

data = [{
    'OrderNumber': 'SO-00000742',
    'QuoteNumber': None,
    'InvoiceDate': '/Date(1658496322067)/',
    'InvoiceLines': [{'LineNumber': 1,
                      'LineType': None},
                     {'LineNumber': 2,
                      'LineType': None}],
    'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
    'LastModifiedOn': '/Date(1658496322221)/'}]


df=pd.DataFrame(data).explode('InvoiceLines')
out=pd.concat([df['InvoiceLines'].apply(pd.Series),
           df.drop(columns=['InvoiceLines'])],
          axis=1)

OUTPUT:

#out

   LineNumber  LineType  OrderNumber QuoteNumber            InvoiceDate  \
0         1.0       NaN  SO-00000742        None  /Date(1658496322067)/   
0         2.0       NaN  SO-00000742        None  /Date(1658496322067)/   
                                       Guid         LastModifiedOn  
0  8f6b89da-1e6e-42288a24-902a-038041e04f06  /Date(1658496322221)/  
0  8f6b89da-1e6e-42288a24-902a-038041e04f06  /Date(1658496322221)/  

I'm leaving the date conversion and column renames for you cause I believe you can do that yourself.

  • Related