Home > Blockchain >  How do I separate text after using BeautifulSoup in order to plot?
How do I separate text after using BeautifulSoup in order to plot?

Time:12-03

I am trying to make a program that scrapes the data from open insider and take that data and plot it. Open insider shows what insiders of the company are buying or selling the stock. I want to be able to show, in an easy to read format, what company, insider type and how much of the stock was purchased. Here is my code so far:

from bs4 import BeautifulSoup
import requests

page = requests.get("http://openinsider.com/top-insider-purchases-of-the-month")

'''print(page.status_code)
checks to see if the page was downloaded successfully'''

soup = BeautifulSoup(page.content,'html.parser')
table = soup.find(class_="tinytable")
data = table.get_text()
#results = data.prettify
print(data, '\n')

Here is an example of some of the results:

X Filing Date Trade Date Ticker Company NameInsider NameTitle Trade Type   Price Qty Owned ΔOwn Value 1d 1w 1m 6m

2022-12-01 16:10:122022-11-30 AKUSAkouos, Inc.Kearny Acquisition Corp10%P - Purchase$12.50 29,992,668100-100% $374,908,350 2022-11-30 20:57:192022-11-29 HHCHoward Hughes CorpPershing Square Capital Management, L.P.Dir, 10%P - Purchase$70.00 1,560,20515,180,369 11% $109,214,243 2022-12-02 17:29:182022-12-02 IOVAIovance Biotherapeutics, Inc.Rothbaum Wayne P.DirP - Purchase$6.50 10,000,00018,067,333 124% $65,000,000

However, for me each year starts a new line.

Is there a better way to use BeautifulSoup? Or is there an easy way to sort through this data and retrieve the specific information I am looking for? Thank You in advance I have been stuck on this for a while.

CodePudding user response:

The real credit goes to @JulianHarkless. When they come back to update their question, I will take this down, but they deserve credit for this answer - they just didn't parse the end properly.

from bs4 import BeautifulSoup
import requests

page = requests.get("http://openinsider.com/top-insider-purchases-of-the-month")
soup = BeautifulSoup(page.content, 'html.parser')

# Find the table with the insider purchase data
table = soup.find(class_="tinytable")

# Find all rows of the table
rows = table.find_all('tr')

# Loop through each row
for row in rows:
    # Extract the company name, insider name, and trade type from the row
    data = row.find_all("td")
    company = data[4].text if len(data) > 4 else "No company name"
    insider = data[5].text if len(data) > 5 else "No insider"
    trade_type = data[7].text if len(data) > 7 else "No trade type"
    # Print the extracted data
    print(f'Company: {company}, Insider: {insider}, Trade Type: {trade_type}')

CodePudding user response:

What Julian said then store values in a dict, load it into a Pandas dataframe and visualize it with plotly.express.

CodePudding user response:

To extract the specific information you are looking for from the data using BeautifulSoup, you can use the find_all() method to find all the rows of the table, and then iterate over each row to extract the relevant data. Here is an example of how you can do this:

from bs4 import BeautifulSoup
import requests

page = requests.get("http://openinsider.com/top-insider-purchases-of-the-month")
soup = BeautifulSoup(page.content, 'html.parser')

# Find the table with the insider purchase data
table = soup.find(class_="tinytable")

# Find all rows of the table
rows = table.find_all('tr')

# Loop through each row
for row in rows:
    # Extract the company name, insider name, and trade type from the row
     data = row.find_all("td")
    company = data[4].text if len(data) > 4 else "No company name"
    insider = data[5].text if len(data) > 5 else "No insider"
    trade_type = data[7].text if len(data) > 7 else "No trade type"
    # Print the extracted data
    print(f'Company: {company}, Insider: {insider}, Trade Type: {trade_type}')

This code will loop through each row of the table and extract the company name, insider name, and trade type from the row. You can modify this code to extract any other information you are interested in from the table.

  • Related