Home > database >  Handling heterogeneous data
Handling heterogeneous data

Time:01-15

I am developing a project to interact with multiple vendors to get the invoices. I have to parse the invoices to get the useful information out of them, and I also have to store the entire invoice. The invoice can be JSON format, Excel, or CSV depending on the vendor. The invoices from different vendors have different columns. Some vendors call ADD as Address and some call it Address. Same is true for all the columns.

I have to build this application on GCP. Programming language can be Python or JAVA.

Can I use some frameworks that are already in place to implement this project?

CodePudding user response:

Yes, you can use existing frameworks to implement this project. Depending on the programming language you choose, there are several options available.

For instance, if you're developing your project in Python, you can use libraries such as pandas, openpyxl, PyPDF2 to handle Excel, CSV and PDF invoice formats respectively. On the other hand, if you're using Java you can use libraries like Apache POI, PDFBox and GSON which also serve the same purpose.

In addition to these libraries, you can also use GCP services such as Cloud Storage, Cloud SQL and BigQuery to store and manage the invoices data.

CodePudding user response:

Yes, you can use existing frameworks to implement your project. For parsing invoices, you can use a library like pandas in Python or Apache POI in Java to read and extract information from different file formats like JSON, Excel, and CSV. To store the invoices, you can use a database like MySQL or MongoDB on GCP, and use a library like PyMySQL or PyMongo to connect and interact with the database from your Python or Java code.

  • Related