I am new to web scraping and am facing difficulty in scraping data as needed. What I want is to scrap the data on the basis of tags with conditions. First, check if it is a 'h3' tag (i.e. It is a question scrap it) now I want to add a condition that if there is a 'p' tag or any other tag that occurs after 'h3' tag only then scrap it else not. I am facing difficulty in implementing such condition.
#This is what I am doing right now
soup = BeautifulSoup(req.content, "html.parser")
title = soup.find_all(['h3', 'p'])
print('List:', *title, sep='\n\n')
CodePudding user response:
Your question is a bit generic and confusing, with that 'any other tag' -- there will surely be other tags after 'h3'. Nonetheless, here is a solution which allows you to filter questions whose next_siblings are <p>
tags, from the ones which have other siblings. You can use this as an example, and eventually modify the function to suit your needs:
import requests
from bs4 import BeautifulSoup as bs
def has_p_sibling(el):
return el.find_next_sibling().name == 'p'
r = requests.get('https://www.geeksforgeeks.org/cpp-interview-questions/')
soup = bs(r.text, 'html.parser')
questions =soup.select('h3')
for q in questions:
if has_p_sibling(q):
print('GOOD Q', q.next_sibling.name, q.text)
else:
print('BAD Q', q.next_sibling.name, q.text)
Result printed in terminal:
GOOD Q p Q-1. What is C ? What are the advantages of C ?
BAD Q div Q- 2. What are the different data types present in C ?
GOOD Q p Q-3. Define ‘std’?
GOOD Q p Q-4. What are references in C ?
GOOD Q p Q-5. What do you mean by Call by Value and Call by Reference?
GOOD Q p Q-6. Define token in C
BAD Q figure Q-7. What is the difference between C and C ?
BAD Q figure Q-8. What is the difference between struct and class?
BAD Q figure Q-9. What is the difference between reference and pointer?
BAD Q figure Q-10. What is the difference between function overloading and operator overloading?
BAD Q figure Q-11. What is the difference between an array and a list?
BAD Q figure Q-12: What is the difference between a while loop and a do-while loop?
BAD Q figure Q-13. Discuss the difference between prefix and postfix?
BAD Q figure Q-14. What is the difference between new and malloc()?
BAD Q figure Q-15. What is the difference between virtual functions and pure virtual functions?
GOOD Q p Q-16. What are classes and objects in C ?
GOOD Q p Q-17. What is Function Overriding?
BAD Q ul Q-18. What are the various OOPs concepts in C ?
GOOD Q p Q-19. Explain inheritance
GOOD Q p Q-20. When should we use multiple inheritance?
GOOD Q p Q-21. What is virtual inheritance?
GOOD Q p Q-22. What is polymorphism in C ?
GOOD Q p Q-23. What are the different types of polymorphism in C ?
BAD Q figure Q-24. Compare compile-time polymorphism and Runtime polymorphism
GOOD Q p Q-25. Explain the constructor in C .
GOOD Q p Q-26. What are destructors in C ?
GOOD Q p Q-27. What is a virtual destructor?
GOOD Q p Q-28. Is destructor overloading possible? If yes then explain and if no then why?
GOOD Q p Q-29. Which operations are permitted on pointers?
GOOD Q p Q-30. What is the purpose of the “delete” operator?
BAD Q figure Q-31. How delete [] is different from delete?
GOOD Q p Q-32. What do you know about friend class and friend function?
GOOD Q p Q-33. What is an Overflow Error?
GOOD Q p Q-34. What does the Scope Resolution operator do?
GOOD Q p Q-35. What are the C access modifiers?
GOOD Q p Q-36. Can you compile a program without the main function?
GOOD Q p Q-37. What is STL?
GOOD Q p Q-38. Define inline function. Can we have a recursive inline function in C ?
GOOD Q p Q-39. What is an abstract class and when do you use it?
GOOD Q p Q-40. What are the static data members and static member functions?
GOOD Q p Q-41. What is the main use of the keyword “Volatile”?
GOOD Q p Q-42. Define storage class in C and name some
GOOD Q p Q-43. What is a mutable storage class specifier? How can they be used?
GOOD Q p Q-44. Define the Block scope variable.
GOOD Q p Q-45. What is the function of the keyword “Auto”?
GOOD Q p Q-46. Define namespace in C .
GOOD Q p Q-47. When is void() return type used?
BAD Q figure Q-48. What is the difference between shallow copy and deep copy?
GOOD Q p Q-49. Can we call a virtual function from a constructor?
GOOD Q p Q-50. What are void pointers?
GOOD Q p Q-1. What is ‘this‘ pointer in C ?
BAD Q button Improve your Coding Skills with Practice
UPDATE: If you want to get all elements from one question to the next, you can do:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.geeksforgeeks.org/cpp-interview-questions/')
soup = bs(r.text, 'html.parser')
questions =soup.select('h3')
for q in questions:
for i, s in enumerate(q.find_all_next()):
if i == 0:
print(' QUESTION', s.get_text(strip=True))
else:
if s.name != 'h3':
if len(s.text) > 5 and s.text != q.find_all_next()[i-1].text:
print('----ANSWER', s.get_text(strip=True))
else:
break
And the result in terminal would be:
QUESTION Q-1. What is C ? What are the advantages of C ?
----ANSWER C is an object-oriented programming language that was introduced to overcome the jurisdictions where C was lacking. By object-oriented we mean that it works with the concept ofpolymorphism,inheritance,abstraction,encapsulation,object, and class.
----ANSWER polymorphism
----ANSWER inheritance
----ANSWER abstraction
----ANSWER encapsulation
----ANSWER object, and class
----ANSWER Advantages of C :
----ANSWER Advantages of C
----ANSWER C is an OOPs language that means the data is considered as objects.C is a multi-paradigm language; In simple terms, it means that we can program the logic, structure, and procedure of the program.Memory management is a key feature in C as it enables dynamic memory allocationIt is a Mid-Level programming language which means it can develop games, desktop applications, drivers, and kernels
----ANSWER C is an OOPs language that means the data is considered as objects.
----ANSWER C is a multi-paradigm language; In simple terms, it means that we can program the logic, structure, and procedure of the program.
----ANSWER Memory management is a key feature in C as it enables dynamic memory allocation
----ANSWER It is a Mid-Level programming language which means it can develop games, desktop applications, drivers, and kernels
----ANSWER To read more, refer to the article –What are the advantages of C ?
----ANSWER What are the advantages of C ?
QUESTION Q- 2. What are the different data types present in C ?
----ANSWER Different types of data types in C
----ANSWER Different types of data types in C
----ANSWER For more information, refer toC data types
----ANSWER C data types
QUESTION Q-3. Define ‘std’?
----ANSWER ‘std’is also known as Standard or it can be interpreted [...]
Please review and try to understand BeautifulSoup foundational logic. Docs can be found at https://beautiful-soup-4.readthedocs.io/en/latest/index.html