Convert text file to CSV-CodePudding

I have a text file and I want to convert it to a CSV.

Input:

Tabular Editor 2.15.3 (build 2.15.7767.23827)
--------------------------------
Loading model...
Running Best Practice Analyzer...
=================================
Measure [Internet Current Quarter Gross Profit Performance] violates rule "[DAX Expressions] Use the DIVIDE function for division"
Measure [Total Current Quarter Sales Performance] violates rule "[DAX Expressions] Avoid using the IFERROR function"
Relationship 'Employee'[SalesTerritoryKey] --> 'Sales Territory'[SalesTerritoryKey] violates rule "[DAX Expressions] Inactive relationships that are never activated"
Relationship 'Reseller Sales'[ShipDateKey] --> 'Date'[DateKey] violates rule "[DAX Expressions] Inactive relationships that are never activated"
=================================

Output:

Object Type	Object	Rule Violated
Measure	[Internet Current Quarter Gross Profit Performance]	"[DAX Expressions] Use the DIVIDE function for division"
Measure	[Total Current Quarter Sales Performance]	"[DAX Expressions] Avoid using the IFERROR function"
Relationship	'Employee'[SalesTerritoryKey] --> 'Sales Territory'[SalesTerritoryKey]	Inactive relationships that are never activated"
Relationship	'Reseller Sales'[ShipDateKey] --> 'Date'[DateKey]	Inactive relationships that are never activated"

Any help to achieve the above would be much appreciated.

I have a list of object types which can appear in first column

object_types = ["Calculated Column", "Calculated Table Column", "Calculated Table",
                "Calculated Group", "Calculated Item", "Culture", "Data Column",
                "Hierarchies", "KPIs", "Level", "Measure", "Model", "Model Roles",
                "Named Expressions", "Partitions", "Perspectives",
                "Provider Data Sources", "Relationship", "Structured Data Sources",
                "Table Permissions", "Tables", "Variations"]

We could also replace " violates rule " with a "," to separate column 2 and 3.

Getting stuck on how to separate column 1 and 2. How to find first and last line which begins with an =. This will help us determine the first row of data. Basically the data is in between the == sign lines.

CodePudding user response：

You may use the following approach in order to convert a text file to a CSV file using Python:

import pandas as pd

read_file = pd.read_csv (r'Path where the Text file is stored\File name.txt')
read_file.to_csv (r'Path where the CSV will be saved\File name.csv', index=None)

Step 1:

pip install pandas

Step 2: Capture the path where your text file is stored

eg: C:\Users\Ron\Desktop\Test\Product_List.txt

Step 3: Specify the path where the new CSV file will be saved

C:\Users\Ron\Desktop\Test\New_Products.csv

Step 4: Convert the text file to CSV using Python

import pandas as pd

read_file = pd.read_csv (r'Path where the Text file is stored\File name.txt')
read_file.to_csv (r'Path where the CSV will be saved\File name.csv', index=None)

For our example:

The path where the text file is stored is: C:\Users\Ron\Desktop\Test\Product_List.txt Where the file name is Product_List and the file extension is txt The path where the CSV will be saved is: C:\Users\Ron\Desktop\Test\New_Products.csv Where the new file name to be created is New_Products and the file extension is csv

import pandas as pd

read_file = pd.read_csv (r'C:\Users\Ron\Desktop\Test\Product_List.txt')
read_file.to_csv (r'C:\Users\Ron\Desktop\Test\New_Products.csv', index=None)

CodePudding user response：

You could use a regex to match the various parts of the error messages, using re.findall to get a list of tuples containing the matching parts:

text = '''
Tabular Editor 2.15.3 (build 2.15.7767.23827)
--------------------------------
Loading model...
Running Best Practice Analyzer...
=================================
Measure [Internet Current Quarter Gross Profit Performance] violates rule "[DAX Expressions] Use the DIVIDE function for division"
Measure [Total Current Quarter Sales Performance] violates rule "[DAX Expressions] Avoid using the IFERROR function"
Relationship 'Employee'[SalesTerritoryKey] --> 'Sales Territory'[SalesTerritoryKey] violates rule "[DAX Expressions] Inactive relationships that are never activated"
Relationship 'Reseller Sales'[ShipDateKey] --> 'Date'[DateKey] violates rule "[DAX Expressions] Inactive relationships that are never activated"
=================================
'''
object_types = ["Calculated Column", "Calculated Table Column", "Calculated Table", "Calculated Group", "Calculated Item", "Culture", "Data Column", "Hierarchies", "KPIs", "Level", "Measure", "Model", "Model Roles", "Named Expressions", "Partitions", "Perspectives", "Provider Data Sources", "Relationship", "Structured Data Sources", "Table Permissions", "Tables", "Variations"]
matches = re.findall(r'('   '|'.join(object_types)   ')\s (.*?)\s violates rule\s (.*)\n', text)

Output:

[
 ('Measure', '[Internet Current Quarter Gross Profit Performance]', '"[DAX Expressions] Use the DIVIDE function for division"'),
 ('Measure', '[Total Current Quarter Sales Performance]', '"[DAX Expressions] Avoid using the IFERROR function"'),
 ('Relationship', "'Employee'[SalesTerritoryKey] --> 'Sales Territory'[SalesTerritoryKey]", '"[DAX Expressions] Inactive relationships that are never activated"'),
 ('Relationship', "'Reseller Sales'[ShipDateKey] --> 'Date'[DateKey]", '"[DAX Expressions] Inactive relationships that are never activated"')
]

From there it's easy to write that out to a CSV file using for example the technique in this question.

Note that if you are reading the text file line by line, you should replace the \n at the end of the regex with $. For example:

matches = []
with open(filename) as file:
    for line in file:
        matches.append(re.findall(r'('   '|'.join(object_types)   ')\s (.*?)\s violates rule\s (.*)$', line)