Remove the last two columns from a table in a txt file that's poorly formatted?-CodePudding

To process data from Synchro into a readable format in excel, there are extra columns that should be dropped.

Raw Data from a txt file:

 Lane Group                   WBT   WBR NBL NBT NBR SBL SBT SBR Ø3  Ø7

 Lane Configurations          <1>    0   1   2>  0   1   2   1

Reading this txt file as a csv puts every line into a single string

My goal is to: 1.) read as a csv with correctly delimited rows/columns (using \t as a separator)

2.) Drop any columns and data after the 'SBR' column

Code I'm using:

AMtxt = pd.read_csv('AM.txt', sep='\t ', header = None, error_bad_lines=False, warn_bad_lines = False, quoting=3, skiprows=1, engine='python')

AMtxt.drop(columns = AMtxt.columns[-2:], axis = 1, inplace = True)

AMtxt.to_csv('AM.csv')

When I try to use this, it removes the "Lane Group" row for some of the entries in the read_csv stage.

What CSV should look like:

The CSV looks like for problematic entries (ones where there are data in the columns I'm removing

EDIT: SOLVED

AMtxt = pd.DataFrame(open('AM.txt','r').readlines())
AMtxt = AMtxt[0].str.split('\t', expand = True)

for column in AMtxt:
    AMtxt[column] = AMtxt[column].str.strip()

AMtxt.to_csv('AM.csv')

This method worked for me.

CodePudding user response：

Solution that worked for me:

AMtxt = pd.DataFrame(open('AM.txt','r').readlines())
AMtxt = AMtxt[0].str.split('\t', expand = True)

for column in AMtxt:
    AMtxt[column] = AMtxt[column].str.strip()

AMtxt.drop(columns = AMtxt.columns[-(len(AMtxt.columns) - 14):], axis = 1, inplace = True)
AMtxt.to_csv('AM.csv')