Home > Enterprise >  How to scrape data into an excel file
How to scrape data into an excel file



I'm trying to scrape data from the UIUC airfoil database website but all of the links are formatted differently than the other. I tried using pandas read table and use skiprows to skip the non-data point part of the url but every url have a different number of rows to skip. How can I manage to only read the numbers in the url?

CodePudding user response:

Use pd.read_fwf() which will read a table of fixed-width formatted lines into DataFrame:

In terms of how to handle different files with different rows to skip, what we could do is once the file is read, just count the rows until there is a line that contains only numeric values. Then feed that into the skiprows parameter.

import pandas as pd
from io import StringIO
import requests

url = 'https://m-selig.ae.illinois.edu/ads/coord/ag25.dat'
response = requests.get(url).text

for idx, line in enumerate(response.split('\n'), start=1):
    if all([x.replace('.','').isdecimal() for x in line.split()]):
    skip = idx    

df = pd.read_fwf(StringIO(response), skiprows=skip, header=None)


            0         1
0    1.000000  0.000283
1    0.994054  0.001020
2    0.982050  0.002599
3    0.968503  0.004411
4    0.954662  0.006281
..        ...       ...
155  0.954562  0.001387
156  0.968423  0.000836
157  0.982034  0.000226
158  0.994050 -0.000374
159  1.000000 -0.000680

[160 rows x 2 columns]
  • Related