I am trying to parse through a bunch of IGC files with flight data to obtain an excel list of all of the data.
I start with:
with open('2022-07-04-XCT-VSH-16.igc','r') as flight:
for line in flight:
if (line.startswith('B')):
print(line)
Which returns:
B1644044003365N10517994WA019110191672
B1644054003365N10517994WA019110191792
B1644064003365N10517994WA019110191690
B1644074003366N10517993WA019110191717
B1644084003366N10517993WA019110191708
B1644094003365N10517993WA019110191775
B1644104003365N10517993WA019110191725
B1644114003364N10517993WA019110191872
B1644124003364N10517992WA019110191856
....
the data I want to play with
Then I run:
df = pd.DataFrame(columns=['timestamp','latitude','longitude','alt_pressure','alt-GPS'])
def flightparser():
with open('2022-07-04-XCT-VSH-16.igc','r') as flight:
for line in flight:
if (line.startswith('B')):
df2 = df.append({
'timestamp' : [int(line[1:7])],
'latitude' : [str(line[7:15])],
'longitude' : [str(line[15:24])],
'alt-pressure' : [int(line[25:30])],
'alt-GPS' : [int(line[30:35])],},ignore_index=True)
flightparser()
print(df2)
which returns 'None"
They idea is that I can split a chunk of each one of these jumbled up text in order to seperate it into it's own column, with each line of text representing its own row. I've tried changing a lot of the syntax and still can't figure it out. Any ideas?
Full Text File: https://pastebin.com/iEuZtfvN
CodePudding user response:
You are discarding df2
on each iteration. What you probably want is the following:
def flightparser():
df = pd.DataFrame(columns=['timestamp','latitude','longitude','alt_pressure','alt-GPS'])
with open('2022-07-04-XCT-VSH-16.igc','r') as flight:
for line in flight:
if (line.startswith('B')):
df = df.append({
'timestamp' : [int(line[1:7])],
'latitude' : [str(line[7:15])],
'longitude' : [str(line[15:24])],
'alt-pressure' : [int(line[25:30])],
'alt-GPS' : [int(line[30:35])],
}, ignore_index=True)
return df
print(flightparser())
Here you are creating the dataframe within the function scope, updating it on each new row and then returning it with all the data ingested. Doing all actions inside the function also allows you not to worry about the global df
variable being changed by something else.
CodePudding user response:
What you're dealing with is fixed with format, and pandas has a specific way to read this type of file which will be far faster and not use a deprecated (append
) function. You just have to pre-process the file a bit.
import pandas as pd
# from requests import get
from io import StringIO
# data = get('https://pastebin.com/raw/iEuZtfvN').text
# data = [x for x in data.splitlines() if x.startswith('B')]
with open('2022-07-04-XCT-VSH-16.igc','r') as flight:
data = [x for x in flight if x.startswith('B')]
df = pd.read_fwf(StringIO('\n'.join(data)),
colspecs=[(1,7),(7,15),(15,24),(25,30),(30,35)],
names=['timestamp','latitude','longitude','alt_pressure','alt-GPS'])
print(df)
# Output:
timestamp latitude longitude alt_pressure alt-GPS
0 164404 4003365N 10517994W 1911 1916
1 164405 4003365N 10517994W 1911 1917
2 164406 4003365N 10517994W 1911 1916
3 164407 4003366N 10517993W 1911 1917
4 164408 4003366N 10517993W 1911 1917
... ... ... ... ... ...
6919 183924 4012435N 10515001W 1630 1618
6920 183925 4012435N 10515001W 1629 1616
6921 183926 4012435N 10515001W 1629 1616
6922 183927 4012435N 10515001W 1629 1616
6923 183928 4012435N 10515001W 1629 1615
[6924 rows x 5 columns]