Working on a little experiment to just pull some text off some pages. Using an IMDB page for the movie
In my mind what I think I need to do is say "If class
in [primary_photo, ellipsis, character] then pass to the next td
tag, else get the text from a href
", but translating that to code that works has been a challenge. Any help/guidance is appreciated.
CodePudding user response:
To get actor's name you can try below code:
names = [td.text for td in driver.find_elements('xpath', '//td[@]/following-sibling::td[1]')]
CodePudding user response:
If you can find a parent element with a name, you can then search for tags within that scope.
For example:
parentElement = driver.find_element(By.ID, "parent-element")
childLinks = parentElement.find_elements(By.TAG_NAME, "a")
In your case, something like this would work
castTable = driver.find_element(By.CLASS_NAME, 'cast_list')
castLinks = castTable.find_elements(By.TAG_NAME, 'a')
for i in castLinks:
print(i.get_attribute('innerText'))
This won't help in your case, but I'll leave this here for others looking for answers on how to locate an element without an identifier:
You can also select elements with no identifiers by searching for the text they contain
link = driver.find_element(By.XPATH, "//a[contains(text(), 'Daniel Craig')]")
CodePudding user response:
You can try to filter out 'a' tags for which href attribute contains "cl_t" and does not contain "character".
This may not be an actual solution, but can be used as a workaround.
CodePudding user response:
To get only a list
of actor names try to use combination of list comprehension
and css selectors
:
table.cast_list tr td:nth-of-type(2)
Selector selects your <table>
and looks in all of its <tr>
for the second <td>
.
Example:
It may not need selenium
, requests
will also work, but anyway check the BeautifulSoup
part.
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://www.imdb.com/title/tt2379713/fullcredits?ref_=tt_cl_sm').text)
print([a.get_text(strip=True) for a in soup.select('table.cast_list tr td:nth-of-type(2)')])
[a.get_text(strip=True) for a in soup.select('table.cast_list tr td:nth-of-type(2)')]
Output:
['Daniel Craig', 'Christoph Waltz', 'Léa Seydoux', 'Ralph Fiennes', 'Monica Bellucci', 'Ben Whishaw', 'Naomie Harris', 'Dave Bautista', 'Andrew Scott', 'Rory Kinnear', 'Jesper Christensen', 'Alessandro Cremona', 'Stephanie Sigman', 'Tenoch Huerta', 'Adriana Paz', 'Domenico Fortunato', 'Marco Zingaro', 'Stefano Elfi DiClaudia', 'Ian Bonar', 'Tam Williams', 'Richard Banham', 'Pip Carter', 'Simon Lenagan', 'Alessandro Bressanello', 'Marc Zinga', 'Brigitte Millar', 'Adel Bencherif', 'Gediminas Adomaitis', 'Peppe Lanzetta', 'Francesco Arca', 'Matteo Taranto', 'Emilio Aniba', 'Benito Sagredo', 'Dai Tabuchi', 'George Lasha', 'Sargon Yelda', 'Andy Cheung', 'Erick Hayden', 'Oleg Mirochnikov', 'Antonio Salines', 'Miloud Mourad Benamara', 'Gido Schimanski', 'Nigel Barber', 'Patrice Naiambana', 'Stephane Cornicard', 'Gary Fannin', 'Sadao Ueda', 'Phillip Law', 'Wai Wong', 'Joseph Balderrama', 'Eiji Mihara', 'Junichi Kajioka', 'Victor Schefé', 'Harald Windisch', 'Tristan Matthiae', 'Detlef Bothe', 'Bodo Friesecke', 'Wilhelm Iben', 'Noemi Krausz', 'Noah Saavedra', 'Francis Attakpah', 'Michael Glantschnig', 'Marlon Boess', 'Marie Fee Wohlmuth', 'Lili Epply', 'Konstantin Gerlach', 'Lara Parmiani', 'Umit Ulgen', 'Amra Mallassi', 'Ziad Abaza', 'Walid Mumuni', 'Derek Horsham', 'Nari Blair-Mangat', 'Michael White', 'Adam McGrady', 'Nader Dernaika', 'Pezh Maan', 'Nad Abdoolakhan', 'Adil Akram', 'Alister Albert', 'Lasco Atkins', 'Omar Ayala', 'David Olawale Ayinde', 'Mohan Banerji', 'Steve Barnett', 'Mark Baxter', 'Paul Blackwell', 'Gerardo Bosco', 'Tom Bourlet', 'Lorenzo Brambilla', 'Matthew Brandon', 'Harry Brewis', 'Dante Briggins', 'Jill Buchanan', 'Oliver Cantú Lozano', 'Calvin Chen', 'Mahmud Chowdhury', 'Eric Coco', 'Maurisa Selene Coleman', 'Bern Collaço', 'Fabio Colonna', 'Christopher DeGress', 'Alan Del Castillo', 'Judi Dench', 'Leigh Dent', 'Filip Dordievski', 'Steve Doyle', 'Daniel Eghan', 'Leila Elbahy', 'Marc Esse', 'Karl Farrer', 'Lucy Figueroa', 'Neve Gachev', 'Gloria Garcia', 'David Georgiou', 'Tim Hammersley', 'Sam Hanover', 'Bunmi Hazzan', 'David Howkins', 'Daniel Jones', 'Justified', 'Samantha Kelly', 'Attila G. Kerekes', 'Kaveh Khatiri', 'Denis Khoroshko', 'Darryl Lane', 'Jorge Leon', 'Rogers Leona', 'Volenté Lloyd', 'Tyrone Love', 'Shaun Lucas', 'Johnny Lynch', 'Sid Man', 'Joanne Manchester', 'Gary Mancini', 'Sergio Mariano', 'Garry Marriott', 'Christopher Michael J. Marsh', 'Nicholas Marshall', 'Alex Martin', 'Martyn Mayger', 'Pete Meads', 'Bradley Wj Miller', 'Keith Milner', 'Haaris Mirza', 'Sandeep Mohan', 'Matija Matovic Mondi', 'Martín Montellano', 'Stefania Montesolaro', 'Arnold Montey', 'James M.L. Muller', 'Benjayx Murphy', 'Taylor Murphy', 'Mahel Nahim', 'Kumud Pant', 'Ashish Patel', 'Richard Pearce', 'Mac Pietowski', 'Mike Ray', 'Graham j Reeves', 'Michael Riedacher', 'Angel Rossell', 'Vuksan Rovcanin', 'Maurice Sardison', 'Jason Saunders', 'Linus Scheithauer', 'Lady Conny Sharples', 'Stuart Shepherd-Garner', 'Sam Shoubber', 'Weiwei Si', 'Ernesto Siller', 'Clem So', 'Daran Somers', 'Adrian South', 'Karol Steele', 'Daniel Stisen', 'Ellen Claire Sutherland', 'Phil Tillott', 'Winson Ting', 'Chuen Tsou', 'Romeo Visca', 'Tony Paul West', 'Paul Weston', 'Daniel Westwood', 'Chris Wilson', 'Gregg Wilson', 'Michael G. Wilson', 'Danielle Yen', 'Miroslav Zaruba', 'Ruolan Zhang', 'Dominic Zwemmer', 'Julio César Álvarez']