How can you use a for loop to convert a text file to DataFrame columns?-CodePudding

I am trying to parse through a bunch of IGC files with flight data to obtain an excel list of all of the data.

I start with:

with open('2022-07-04-XCT-VSH-16.igc','r') as flight:

for line in flight:
    if (line.startswith('B')):
        print(line)

Which returns:

B1644044003365N10517994WA019110191672
B1644054003365N10517994WA019110191792
B1644064003365N10517994WA019110191690
B1644074003366N10517993WA019110191717
B1644084003366N10517993WA019110191708
B1644094003365N10517993WA019110191775
B1644104003365N10517993WA019110191725
B1644114003364N10517993WA019110191872
B1644124003364N10517992WA019110191856
....

the data I want to play with

Then I run:

    df = pd.DataFrame(columns=['timestamp','latitude','longitude','alt_pressure','alt-GPS'])
def flightparser():
    with open('2022-07-04-XCT-VSH-16.igc','r') as flight:
         for line in flight:
            if (line.startswith('B')): 
                    df2 = df.append({
                    'timestamp' : [int(line[1:7])],
                    'latitude'  : [str(line[7:15])],
                    'longitude' : [str(line[15:24])],
                    'alt-pressure'  : [int(line[25:30])],
                    'alt-GPS'   : [int(line[30:35])],},ignore_index=True)
flightparser()
print(df2)

which returns 'None"

They idea is that I can split a chunk of each one of these jumbled up text in order to seperate it into it's own column, with each line of text representing its own row. I've tried changing a lot of the syntax and still can't figure it out. Any ideas?

Full Text File: https://pastebin.com/iEuZtfvN

CodePudding user response：

You are discarding df2 on each iteration. What you probably want is the following:

def flightparser():
    df = pd.DataFrame(columns=['timestamp','latitude','longitude','alt_pressure','alt-GPS'])
    with open('2022-07-04-XCT-VSH-16.igc','r') as flight:
        for line in flight:
            if (line.startswith('B')): 
                df = df.append({
                    'timestamp' : [int(line[1:7])],
                    'latitude'  : [str(line[7:15])],
                    'longitude' : [str(line[15:24])],
                    'alt-pressure'  : [int(line[25:30])],
                    'alt-GPS'   : [int(line[30:35])],
                }, ignore_index=True)
    return df
print(flightparser())

Here you are creating the dataframe within the function scope, updating it on each new row and then returning it with all the data ingested. Doing all actions inside the function also allows you not to worry about the global df variable being changed by something else.

CodePudding user response：

What you're dealing with is fixed with format, and pandas has a specific way to read this type of file which will be far faster and not use a deprecated (append) function. You just have to pre-process the file a bit.

import pandas as pd
# from requests import get
from io import StringIO

# data = get('https://pastebin.com/raw/iEuZtfvN').text
# data = [x for x in data.splitlines() if x.startswith('B')]
with open('2022-07-04-XCT-VSH-16.igc','r') as flight:
    data = [x for x in flight if x.startswith('B')]

df = pd.read_fwf(StringIO('\n'.join(data)),
                 colspecs=[(1,7),(7,15),(15,24),(25,30),(30,35)],
                 names=['timestamp','latitude','longitude','alt_pressure','alt-GPS'])
print(df)

# Output:

      timestamp  latitude  longitude  alt_pressure  alt-GPS
0        164404  4003365N  10517994W          1911     1916
1        164405  4003365N  10517994W          1911     1917
2        164406  4003365N  10517994W          1911     1916
3        164407  4003366N  10517993W          1911     1917
4        164408  4003366N  10517993W          1911     1917
...         ...       ...        ...           ...      ...
6919     183924  4012435N  10515001W          1630     1618
6920     183925  4012435N  10515001W          1629     1616
6921     183926  4012435N  10515001W          1629     1616
6922     183927  4012435N  10515001W          1629     1616
6923     183928  4012435N  10515001W          1629     1615

[6924 rows x 5 columns]