I have a dataset with multiple links and I'm trying to get the text of all the links using the code below, but I'm getting a error message "InvalidSchema: No connection adapters were found for "'https://en.wikipedia.org/wiki/Wagner_Group'".
Dataset:
links
'https://en.wikipedia.org/wiki/Wagner_Group'
'https://en.wikipedia.org/wiki/Vladimir_Putin'
'https://en.wikipedia.org/wiki/Islam_in_Russia'
The code I'm using to web-scrape is:
def get_data(url):
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
text = ""
for paragraph in soup.find_all('p'):
text = paragraph.text
return(text)
#works fine
url = 'https://en.wikipedia.org/wiki/M142_HIMARS'
get_data(url)
#Doesn't work
df['links'].apply(get_data)
Error: InvalidSchema: No connection adapters were found for "'https://en.wikipedia.org/wiki/Wagner_Group'"
Thank you in advance
#It works just fine when I apply it to a single url but it doens't work when I apply it to a dataframe.
CodePudding user response:
df['links'].apply(get_data)
is not compatible with requests and bs4.
You can try one of the right ways as follows:
Example:
import requests
from bs4 import BeautifulSoup
import pandas as pd
links =[
'https://en.wikipedia.org/wiki/Wagner_Group',
'https://en.wikipedia.org/wiki/Vladimir_Putin',
'https://en.wikipedia.org/wiki/Islam_in_Russia']
data = []
for url in links:
req = requests.get(url)
soup = BeautifulSoup(req.text,'lxml')
for pra in soup.select('div[] > table~p'):
paragraph = pra.get_text(strip=True)
data.append({
'paragraph':paragraph
})
#print(data)
df = pd.DataFrame(data)
print(df)
Output:
paragraph
0 TheWagner Group(Russian:Группа Вагнера,romaniz...
1 The group came to global prominence during the...
2 Because it often operates in support of Russia...
3 The Wagner Group first appeared in Ukraine in ...
4 The Wagner Group itself was first active in 20...
.. ...
440 A record 18,000 Russian Muslim pilgrims from a...
441 For centuries, theTatarsconstituted the only M...
442 A survey published in 2019 by thePew Research ...
443 Percentage of Muslims in Russia by region:
444 According to the 2010 Russian census, Moscow h...
[445 rows x 1 columns]