https://m-selig.ae.illinois.edu/ads/coord/ag25.dat
I'm trying to scrape data from the UIUC airfoil database website but all of the links are formatted differently than the other. I tried using pandas read table and use skiprows
to skip the non-data point part of the url but every url have a different number of rows to skip.
How can I manage to only read the numbers in the url?
CodePudding user response:
Use pd.read_fwf()
which will read a table of fixed-width formatted lines into DataFrame:
In terms of how to handle different files with different rows to skip, what we could do is once the file is read, just count the rows until there is a line that contains only numeric values. Then feed that into the skiprows
parameter.
import pandas as pd
from io import StringIO
import requests
url = 'https://m-selig.ae.illinois.edu/ads/coord/ag25.dat'
response = requests.get(url).text
for idx, line in enumerate(response.split('\n'), start=1):
if all([x.replace('.','').isdecimal() for x in line.split()]):
break
skip = idx
df = pd.read_fwf(StringIO(response), skiprows=skip, header=None)
Output:
print(df)
0 1
0 1.000000 0.000283
1 0.994054 0.001020
2 0.982050 0.002599
3 0.968503 0.004411
4 0.954662 0.006281
.. ... ...
155 0.954562 0.001387
156 0.968423 0.000836
157 0.982034 0.000226
158 0.994050 -0.000374
159 1.000000 -0.000680
[160 rows x 2 columns]