data scraping with python-CodePudding

Hi I'm trying to scrape all the data points that are in this url https://m-selig.ae.illinois.edu/ads/coord/a18.dat

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://m-selig.ae.illinois.edu/ads/coord/a18.dat"

page = requests.get(url)
x = BeautifulSoup(page.content, 'html.parser')

df = pd.DataFrame(x)
df.to_excel("air_foil.xlsx")

I've tried this code but x is just a long list that consist of one element.

CodePudding user response：

First of all you need to get this data:

r = requests.get("https://m-selig.ae.illinois.edu/ads/coord/a18.dat")
print(r.tetx)

you will see what inside (string).

Then you need create a list and put to Dataframe:

df = pd.DataFrame([el.split() for el in r.text.split("\r\n")[1:]])

CodePudding user response：

If you are going to use pandas, you can just use pd.read_table(url) or pd.read_csv(url), e.g.

import pandas as pd

url = "https://m-selig.ae.illinois.edu/ads/coord/a18.dat"

df = pd.read_csv(url, header=None, skiprows=1, sep='  ', engine='python')
print(df)
print(df.dtypes)
df =  pd.read_table(url, header=None, skiprows=1, sep='  ', engine='python')
print(df)
print(df.dtypes)
df.to_excel('test.xlsx', index=False, header=False)

CodePudding user response：

please refer to this URL. Hopefully it will help you.

url --> https://www.datacamp.com/tutorial/web-scraping-using-python