Hi I'm trying to scrape all the data points that are in this url https://m-selig.ae.illinois.edu/ads/coord/a18.dat
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://m-selig.ae.illinois.edu/ads/coord/a18.dat"
page = requests.get(url)
x = BeautifulSoup(page.content, 'html.parser')
df = pd.DataFrame(x)
df.to_excel("air_foil.xlsx")
I've tried this code but x
is just a long list that consist of one element.
CodePudding user response:
First of all you need to get this data:
r = requests.get("https://m-selig.ae.illinois.edu/ads/coord/a18.dat")
print(r.tetx)
you will see what inside (string).
Then you need create a list and put to Dataframe:
df = pd.DataFrame([el.split() for el in r.text.split("\r\n")[1:]])
CodePudding user response:
If you are going to use pandas
, you can just use pd.read_table(url)
or pd.read_csv(url)
, e.g.
import pandas as pd
url = "https://m-selig.ae.illinois.edu/ads/coord/a18.dat"
df = pd.read_csv(url, header=None, skiprows=1, sep=' ', engine='python')
print(df)
print(df.dtypes)
df = pd.read_table(url, header=None, skiprows=1, sep=' ', engine='python')
print(df)
print(df.dtypes)
df.to_excel('test.xlsx', index=False, header=False)
CodePudding user response:
please refer to this URL. Hopefully it will help you.
url --> https://www.datacamp.com/tutorial/web-scraping-using-python