Home > Mobile >  delimit text file in python
delimit text file in python

Time:05-17

I'm trying to import a text file and have it delimited in the rows which it can be without losing the lines which can't be. Here's an example:

Some text for the title
some more text for a description
some more descriptions.

City,State,Capital
Philadelphia,Pennsylvania,No
Sacramento,California,Yes
New York,New York,No
Austin,Texas,Yes
Miami,Florida,No

The portion with commas would be delimited.

I've tried a few things.

This is a token error:

pd.read_csv(file.txt, sep=',')

This works but sometimes the text files don't all start on the same line so I'd like to keep the information:

pd.read_csv(file.txt, skiprow=x) 

Is there some parameter I could pass to get this working?

Some text for the title
some more text for a description
some more descriptions
City State Captial
Philadelphia Pennsylvania No

CodePudding user response:

You could split the text file and read each part seperately, then you could use pd.read_csv for it, but as one file, as far as I know, you should just read the file with readlines() and some conditions.

Try it with this:

with open('your_textfile.txt', 'r') as f:
    some_information = []
    row = f.readline()
    while row !='\n':
        some_information.append(row.strip())
        row = f.readline()
        
    data = [x.strip().split(',') for x in f.readlines()]

df = pd.DataFrame(data[1:], columns=data[0])

print(some_information, data, df, sep='\n\n')

['Some text for the title', 'some more text for a description', 'some more descriptions.']

[['City', 'State', 'Capital'],
 ['Philadelphia', 'Pennsylvania', 'No'], 
 ['Sacramento', 'California', 'Yes'], 
 ['New York', 'New York', 'No'], 
 ['Austin', 'Texas', 'Yes'], 
 ['Miami', 'Florida', 'No']]

           City         State Capital
0  Philadelphia  Pennsylvania      No
1    Sacramento    California     Yes
2      New York      New York      No
3        Austin         Texas     Yes
4         Miami       Florida      No

CodePudding user response:

I used this and it work correctly based off the text file you provided

df = pd.read_csv(filepath, skiprows=4)
df
  • Related