Parsing data from a text file and storing it in a database-CodePudding

First and foremost thank you for any and all help you can offer.

The problem we receive data in TXT format and need to be able to parse out that data into some form of database/repository.

The idea is everyday between _____ and ____ hours a .txt file is created containing data. For example "Newdata20220629.txt" in text format.

However, this data is extremely hard to read and almost impossible to search in it's raw form. The txt file is raw however the first line of the txt file contains the columns for each row of data such as "Name, Date, File number," etc..

The following rows are raw data in the order of those categories. For instance; John Smith, 6/29/2022, 1234123

any columns without data in the field have a comma but do not contain data such as;

John Smith,, or ,6/29/2022,

So essentially what I'd like to do is create a tool that runs continuously looking for a file in the format of "Newdate(date).txt", and parsing that text based on what I mentioned above then storing it in a user-friendly and searchable database. Personally, I am thinking a SQL database may be the easiest way for this but don't have a clue as to where I should start.

CodePudding user response：

This should take a .txt file and write to a .csv then write the .csv to an SQL database/Table, just enter your server information and the paths of the file.

Import these to use

import pandas as pd
import pyodbc

Read txt file and write to csv file

read_txt = pd.read_csv('Newdata20220629.txt', delimiter = ',')
write_csv = read_txt.to_csv('Newdata2022029.csv', index = None)

Import CSV

data = pd.read_csv ('C:\Users\ExampleUser\Desktop\Test\Newdata2022029.csv')   
df = pd.DataFrame(data)

Connect to SQL Server

connection = pyodbc.connect('Driver={SQL Server};'
                      'Server=RON\SQLEXPRESS;'
                      'Database=test_database;'
                      'Trusted_Connection=yes;')
cursor = connection.cursor()

Create Table

cursor.execute('''
        CREATE TABLE Table_Name (
            Name nvarchar(50),
            Date nvarchar(50),
            Product_ID Int
            )
               ''')

Insert DataFrame to Table

for row in df.itertuples():
    cursor.execute('''
                INSERT INTO Table_Name (Name, Date, Product_ID)
                VALUES (..., ..., ...)
                ''',
                row.Name, 
                row.Date,
                row.Product_ID
                )
connection.commit()

CodePudding user response：

I suggest using the SQLBulkCopy class (as described here: https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy?view=dotnet-plat-ext-6.0) in conjunction with the CsvDataReader class (https://joshclose.github.io/CsvHelper/examples/csvdatareader/) as demonstrated below. You need to research file system watchers (as Hursey said) to get your App notified when a new file is written to the folder you are monitoring.

Protected Sub UploadCSV(filePath As String)
    ' Create CsvDataReader (IDataReader) to use with SqlBulkCopy
    Using csvData = New CsvDataReader(New FileStream(filePath , FileMode.Open)
        ' Reads first record as a header row.
        ' Name columns based on the values in the header row
        csvData.Settings.HasHeaders = True
        ' Set data types for parsing data
        csvData.Columns.Add("varchar") ' Column 1
        csvData.Columns.Add("varchar") ' Column 2
        csvData.Columns.Add("datetime") ' Column 3
        csvData.Columns.Add("decimal(18,2)") ' Column 4
        ' Create SqlBulkCopy object to import from the CsvDataReader
        Using bulkCopy = New SqlBulkCopy("Data Source=.;Initial Catalog=YourDatabase;User ID=Your Usernamesa;Password=YourPassword")
            ' Table to write to (must already exist).
            bulkCopy.DestinationTableName = "YourSQLTable"
            ' Map CSV column names to SQL columns names
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_1", "SQL_Column_1") 
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_2", "SQL_Column_2")
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_3", "SQL_Column_3")
            bulkCopy.ColumnMappings.Add("CSV_Column_Name_4", "SQL_Column_4")
            ' Do the import
            bulkCopy.WriteToServer(csvData)
        End Using ' dispose SqlBulkCopy
    End Using ' dispose CsvDataReader
End Sub