I'm trying to filter data with Pandas using a list of values which are a couple of str book_tittle
and int book_price
:
import pandas as pd
import requests
from bs4 import BeautifulSoup
# settings_#############################################################################
isbn = {'9782756002484', '9782756025117', '9782756072449'}
url = 'https://www.abebooks.fr/servlet/SearchResults?sts=t&cm_sp=SearchF-_-NullResults-_-Results&isbn={}'
book_title = ["Mondes", "X-Wing"]
book_price = [100, 10]
#######################################################################################
### creation de lien à partir des codes ISBN#
def url_isbn(isbn):
merged = []
for link in isbn:
link_isbn = url.format(link)
merged.append(link_isbn)
return merged
### scraping each url from url_isbn
def get_data():
data = []
for i in url_isbn(isbn):
r = requests.get(i)
soup = BeautifulSoup(r.text, 'html.parser')
item = soup.find_all('div', {'class': 'result-data col-xs-9 cf'})
for x in item:
title = x.find('h2', {'class': 'title'}).text.replace('\n', '')
price = x.find('p', {'class': 'item-price'}).text.replace('EUR ', '').replace(',', '.')
url = 'https://www.abebooks.fr' x.find('a', {'itemprop': 'url'})['href']
products = title, int(float(price)), url
data.append(products)
return data
###creating the dataframe
df = pd.DataFrame(get_data(), columns=["Titre", "Prix", "URL"])
###Filter data into the dataframe
for filtered in df:
df_final_to_email = filtered[(df['Titre'].str.contains(book_title) & (df.Prix < book_price))]
print(df_final_to_email)
I'm getting an error : TypeError : unhashable type : 'list'
I assume I cannot use a list for filtering in cause of the mix of data type, I tested with Tuple and dict, I get the same kind of error
I also try with df.query
but it gives empty data frame
The filter will allow me to filter all the books which has "Mondes" in the title for a price < 100 but also all the books which contains "X-Wing" below a price of < 10, I'll also add more item to find with a price related.
Titre | Prix |
---|---|
Mondes Infernaux | 95,10 |
Star Wars, Mondes Infernaux | 75,50 |
X-Wing Rogue | 9,50 |
X-Wing Rogue Squadron | 7,50 |
Nothing about the filtering, but do you know how I could figure the following :products = title, int(float(price)), url
? I had to use float
as I'm not able to convert like int(price)
as int, I'm a bit annoyed of having rounded down numbers in the dataframe. (if any moderator can tell me if I have to do another post for this specific need ? thank you)
Thank you for your kind help
CodePudding user response:
If you need all matching rows in the dataframe there's no need to use a for
loop.
Maybe try something like this:
def find_book(str, price):
return df[ (df['Titre'].str.contains(str)) & (df['Prix']<price) ]
# find all books containing the substring 'Wing' in the title with price <7
find_book('Wing', 7)
CodePudding user response:
The error resides in your filtering code:
df_final_to_email = filtered[(df['Titre'].str.contains(book_title) & (df.Prix < book_price))]
book_title
is a list. .str.contains
does not work with list. It works with a single string or a regex pattern.
If your intention is to find books with "Mondes" in the title and price under 100 or "X-Wing" in the title and price under 10, you can use the following filtering code:
###Filter data into the dataframe
cond = pd.Series([False] * len(df), index=df.index)
for title, price in zip(book_title, book_price):
cond |= df["Titre"].str.contains(title) & df["Prix"].lt(price)
print(df[cond])