Home > Mobile >  Issues with displaying certain data once WebScraped
Issues with displaying certain data once WebScraped

Time:09-28

    for project in projects:
        soup.findAll('a')
        name = project.text
        print(name)

I am trying to extract just the package names from https://libraries.io/search?order=desc&page=1&platforms=Maven&sort=rank. However, when I run the code above I get an unecessary amount of information that looks something like this:

junit:junit

JUnit is a unit testing framework for Java, created by Erich Gamma and Kent Beck.


  Latest release 4.13.2 -
  Updated
  Feb 13, 2021
   - 8.34K stars

The only Output that I want is the "junit:junit". Any tips on how to achieve this? I have to do this for over 490,000 packages.

CodePudding user response:

Try this for the first page and then repeat:

import requests
from bs4 import BeautifulSoup

url = "https://libraries.io/search?order=desc&page=1&platforms=Maven&sort=rank"
soup = [
    a.getText() for a
    in BeautifulSoup(requests.get(url).text, "lxml").select("div.project > h5 > a")
]
print("\n".join(soup))

Output:

junit:junit
org.springframework:spring-context
org.springframework:spring-test
org.scala-lang:scala-library
org.springframework:spring-core
com.google.guava:guava
org.jetbrains.kotlin:kotlin-stdlib-jdk8
org.jetbrains.kotlin:kotlin-stdlib
com.h2database:h2
org.projectlombok:lombok
com.google.code.gson:gson
org.mockito:mockito-core
org.scala-lang:scala-reflect
org.springframework.boot:spring-boot-starter-test
org.springframework.boot:spring-boot-starter-web
org.springframework:spring-orm
org.springframework:spring-beans
org.springframework:spring-jdbc
org.springframework.boot:spring-boot-starter-actuator
org.springframework.boot:spring-boot-devtools
org.springframework:spring-web
org.junit.jupiter:junit-jupiter-engine
org.springframework:spring-tx
org.springframework:spring-aop
org.springframework:spring-context-support
com.fasterxml.jackson.core:jackson-databind
org.junit.jupiter:junit-jupiter-api
org.springframework:spring-webmvc
org.slf4j:slf4j-api
org.testng:testn
  • Related