I am scraping financial summary from https://www.investing.com/equities/nvidia-corp-financial-summary. Code:
To get the ratio descriptions:
for element in soup.find_all('span', attrs={'class': 'float_lang_base_1'}):
print(element)
The code will result in:
<span >Gross margin</span>
<span >Operating margin</span>
<span >Net Profit margin</span>
<span >Return on Investment</span>
<span >Quick Ratio</span>
<span >Current Ratio</span>
<span >LT Debt to Equity</span>
<span >Total Debt to Equity</span>
<span >Cash Flow/Share</span>
<span >Revenue/Share</span>
<span >Operating Cash Flow</span>
To get the values for each of ratio above:
for element in soup.find_all('span', attrs={'class': 'float_lang_base_2 text_align_lang_base_2 dirLtr bold'}):
a = element.get_text()
results in:
60.45%
31.47%
26.03%
22.86%
2.95
3.62
-
49.02%
-
-
16.77%
Now, I need to match the two, so that it will be a key value pair that can be transformed into a dataframe.
Gross margin : 60.45%
Operating margin: 31.47%
Net Profit margin: 26.03%
...
CodePudding user response:
You can find main div
tag which has both the values and iterate over that to identify other properties using class
and append to dict1
dict1={}
for element in soup.find_all('div', attrs={'class': 'infoLine'}):
name=element.find("span",class_="float_lang_base_1").get_text()
value=element.find("span",class_="float_lang_base_2").get_text()
dict1[name]=value
Here you can use pandas
to create df
and transform dict1
to table form data
import pandas as pd
df=pd.DataFrame(dict1.items(),columns=['A','B'])
df
Output:
A B
0 Gross margin 60.45%
1 Operating margin 31.47%
.....
CodePudding user response:
You can Get values from two different lists into a single dictionary
Mykeys = ["a", "b", "c"]
Myvalues = [1, 3, 5]
print ("Mykey list: " str(Mykeys))
print ("Myvalue list: " str(Myvalues))
res = dict(zip(Mykeys, Myvalues))
print ("New dictionary will be : " str(res))
CodePudding user response:
As mentioned in the answers you could zip()
your lists and transform into dict()
.
Anyway there is an altrnative approach in selecting and extracting the information from the elements:
dict(list(row.stripped_strings)[::len(list(row.stripped_strings))-1] for row in soup.select('.infoLine'))
This one will select()
or find_all()
elements with class infoLine
what is the container tag of the <span>
s. While .stripped_strings
extract the texts as a ResultSet
we only have to list slice
the first and the last element and convert it in dict comprehension
to the final result.
Be aware: Zipping lists or using lists at all you have to ensure, that they will have the same length, else you will get an error concerning this missmatch.
Example
import requests
from bs4 import BeautifulSoup
url='https://www.investing.com/equities/nvidia-corp-financial-summary'
soup = BeautifulSoup(requests.get(url, headers = {'User-Agent': 'Mozilla/5.0'}).text)
dict(list(row.stripped_strings)[::len(list(row.stripped_strings))-1] for row in soup.select('.infoLine'))
Output
{'Gross margin': '60.45%',
'Operating margin': '31.47%',
'Net Profit margin': '26.03%',
'Return on Investment': '22.86%',
'Quick Ratio': '2.95',
'Current Ratio': '3.62',
'LT Debt to Equity': '-',
'Total Debt to Equity': '49.02%',
'Cash Flow/Share': '-',
'Revenue/Share': '-',
'Operating Cash Flow': '16.77%'}