Home > Software engineering >  How to Create Spark or Pandas Dataframe from str output in Apache Spark on Databricks
How to Create Spark or Pandas Dataframe from str output in Apache Spark on Databricks

Time:05-30

I have assigned the variable "myoutput" to a string as follows

myoutput = result.content

myoutput is as follows:

Out[10]: 'Company A Invoice\nInvoice For:\nAddress:\n567 Main St.\nRedmond, WA\n555-555-5555\nBilbo Baggins\n123 Hobbit Lane\nRedmond, WA\n555-555-5555\nSubtotal: 300.00\nTax: 30.00\nTip: 100.00\nTotal: 430.00\nSignature: ____Bilbo Baggins__________\nItem\nQuantity\nPrice\nA\n1\n10.99\nB\n2\n14.67\nC\n4\n15.66\nD\n1\n12.00\nE\n4\n10.00\nF\n6\n12.00\nG\n8\n22.00'

I would like to create a spark dataframe or a pandas dataframe from "myoutput".

Any ideas?

CodePudding user response:

 import pandas as pd
 str_output = 'Company A Invoice\nInvoice For:\nAddress:\n567 Main St.\nRedmond, WA\n555-555-5555\nBilbo Baggins\n123 Hobbit Lane\nRedmond, WA\n555-555-5555\nSubtotal: 300.00\nTax: 30.00\nTip: 100.00\nTotal: 430.00\nSignature: ____Bilbo Baggins__________\nItem\nQuantity\nPrice\nA\n1\n10.99\nB\n2\n14.67\nC\n4\n15.66\nD\n1\n12.00\nE\n4\n10.00\nF\n6\n12.00\nG\n8\n22.00'
 df_data = pd.DataFrame({'ColumnA':str_output.splitlines()})
 df_data

Reference: How to split a Python string on new line characters

  • Related