Convert a text file with a particular format into dataframe-CodePudding

I am new to Pandas and thus I wanted to know if I can convert my text file with a particular format into a Pandas data frame. Below is my text file format

"FACT"|"FSYM"|"POSITION"|"INDIRECT_OPTIONS"|"REPORT"|"SOURCE"|"COMMENTS"|
"ABCX"|"VVG1"|2800000|760000|2022-11-03|"A"|"INCLUDES CAR"|0

I wanted to convert this format in Pandas with same columns and values as separated by | sign. That is my data frame columns will be FACT, FYSM,POSITION, and so on.

I am trying below code but it does not give me the desired output.

def convert_factset_file_to_dataframe(test_case_name, file_name):
    data = pd.read_csv("{}/output/Float_Ingestion_files/{}/{}.txt".format(str(parentDir), test_case_name, file_name), sep=',')

    print(data)

It is printing as follows. Just adding the index.

    "FACT"|"FSYM"|"POSITION"|"INDIRECT_OPTIONS"|"REPORT"|"SOURCE"|"COMMENTS"|
0    "ABCX"|"VVG1"|2800000|760000|2022-11-03|"A"|"INCLUDES CAR"|0

Is there any other way of converting my text file format to a data frame besides reading it as a CSV? Or I need to incorporate some changes in the code?

CodePudding user response：

You can use the argument sep (as stated in Thomas' comment).

data = pd.read_csv(filepath, sep="|")

For more information, see the documentation.

CodePudding user response：

df.to_csv(file_name, sep='\t')

To use a specific encoding (e.g. 'utf-8') use the encoding argument:

df.to_csv(file_name, sep='\t', encoding='utf-8')

CodePudding user response：

I think you have a typo and should call

data = pd.read_csv(
    "{}/output/Float_Ingestion_files/{}/{}.txt".format(
        str(parentDir), test_case_name, file_name
    ),
    sep="|",  # <<<<<<<<< don't choose the comma here, choose `|`
)

That is, just change the argument for the separator to be the | sign