First and foremost thank you for any and all help you can offer.
The problem we receive data in TXT format and need to be able to parse out that data into some form of database/repository.
The idea is everyday between _____ and ____ hours a .txt file is created containing data. For example "Newdata20220629.txt" in text format.
However, this data is extremely hard to read and almost impossible to search in it's raw form. The txt file is raw however the first line of the txt file contains the columns for each row of data such as "Name, Date, File number," etc..
The following rows are raw data in the order of those categories. For instance; John Smith, 6/29/2022, 1234123
any columns without data in the field have a comma but do not contain data such as;
John Smith,, or ,6/29/2022,
So essentially what I'd like to do is create a tool that runs continuously looking for a file in the format of "Newdate(date).txt", and parsing that text based on what I mentioned above then storing it in a user-friendly and searchable database. Personally, I am thinking a SQL database may be the easiest way for this but don't have a clue as to where I should start.
CodePudding user response:
This should take a .txt file and write to a .csv then write the .csv to an SQL database/Table, just enter your server information and the paths of the file.
Import these to use
import pandas as pd
import pyodbc
Read txt file and write to csv file
read_txt = pd.read_csv('Newdata20220629.txt', delimiter = ',')
write_csv = read_txt.to_csv('Newdata2022029.csv', index = None)
Import CSV
data = pd.read_csv ('C:\Users\ExampleUser\Desktop\Test\Newdata2022029.csv')
df = pd.DataFrame(data)
Connect to SQL Server
connection = pyodbc.connect('Driver={SQL Server};'
'Server=RON\SQLEXPRESS;'
'Database=test_database;'
'Trusted_Connection=yes;')
cursor = connection.cursor()
Create Table
cursor.execute('''
CREATE TABLE Table_Name (
Name nvarchar(50),
Date nvarchar(50),
Product_ID Int
)
''')
Insert DataFrame to Table
for row in df.itertuples():
cursor.execute('''
INSERT INTO Table_Name (Name, Date, Product_ID)
VALUES (..., ..., ...)
''',
row.Name,
row.Date,
row.Product_ID
)
connection.commit()
CodePudding user response:
I suggest using the SQLBulkCopy
class (as described here: https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy?view=dotnet-plat-ext-6.0) in conjunction with the CsvDataReader class (https://joshclose.github.io/CsvHelper/examples/csvdatareader/) as demonstrated below. You need to research file system watchers (as Hursey said) to get your App notified when a new file is written to the folder you are monitoring.
Protected Sub UploadCSV(filePath As String)
' Create CsvDataReader (IDataReader) to use with SqlBulkCopy
Using csvData = New CsvDataReader(New FileStream(filePath , FileMode.Open)
' Reads first record as a header row.
' Name columns based on the values in the header row
csvData.Settings.HasHeaders = True
' Set data types for parsing data
csvData.Columns.Add("varchar") ' Column 1
csvData.Columns.Add("varchar") ' Column 2
csvData.Columns.Add("datetime") ' Column 3
csvData.Columns.Add("decimal(18,2)") ' Column 4
' Create SqlBulkCopy object to import from the CsvDataReader
Using bulkCopy = New SqlBulkCopy("Data Source=.;Initial Catalog=YourDatabase;User ID=Your Usernamesa;Password=YourPassword")
' Table to write to (must already exist).
bulkCopy.DestinationTableName = "YourSQLTable"
' Map CSV column names to SQL columns names
bulkCopy.ColumnMappings.Add("CSV_Column_Name_1", "SQL_Column_1")
bulkCopy.ColumnMappings.Add("CSV_Column_Name_2", "SQL_Column_2")
bulkCopy.ColumnMappings.Add("CSV_Column_Name_3", "SQL_Column_3")
bulkCopy.ColumnMappings.Add("CSV_Column_Name_4", "SQL_Column_4")
' Do the import
bulkCopy.WriteToServer(csvData)
End Using ' dispose SqlBulkCopy
End Using ' dispose CsvDataReader
End Sub