How to scrape data into an excel file-CodePudding

https://m-selig.ae.illinois.edu/ads/coord/ag25.dat

I'm trying to scrape data from the UIUC airfoil database website but all of the links are formatted differently than the other. I tried using pandas read table and use skiprows to skip the non-data point part of the url but every url have a different number of rows to skip. How can I manage to only read the numbers in the url?

CodePudding user response：

Use pd.read_fwf() which will read a table of fixed-width formatted lines into DataFrame:

In terms of how to handle different files with different rows to skip, what we could do is once the file is read, just count the rows until there is a line that contains only numeric values. Then feed that into the skiprows parameter.

import pandas as pd
from io import StringIO
import requests

url = 'https://m-selig.ae.illinois.edu/ads/coord/ag25.dat'
response = requests.get(url).text

for idx, line in enumerate(response.split('\n'), start=1):
    if all([x.replace('.','').isdecimal() for x in line.split()]):
        break
    skip = idx    

df = pd.read_fwf(StringIO(response), skiprows=skip, header=None)

Output:

print(df)
            0         1
0    1.000000  0.000283
1    0.994054  0.001020
2    0.982050  0.002599
3    0.968503  0.004411
4    0.954662  0.006281
..        ...       ...
155  0.954562  0.001387
156  0.968423  0.000836
157  0.982034  0.000226
158  0.994050 -0.000374
159  1.000000 -0.000680

[160 rows x 2 columns]