I am trying to read HTML table from outlook application using beautifulsoup
. The table contains two main columns: Ticker
and price
. Now I am trying to add a third column named as Pkey
to the existing dataframe.
I am able to add it tough and it works fine till the email has a full list of tickers (7 in total). In case sometimes we don't receive a full list of tickers, say from 7 we receive prices for only 3 tickers, then in column 3, I need Pkeys
against those 3 tickers.
How is that possible?
We have the following code:
import pandas as pd
import win32com.client
from sqlalchemy.engine import create_engine
import re
from datetime import datetime, timedelta
import requests
import sys
from bs4 import BeautifulSoup
from pprint import pprint
EMAIL_ACCOUNT = 'robinhood.gmail.com'
EMAIL_SUBJ_SEARCH_STRING = 'Morgan Stanley Systematic Strategies Daily Levels'
out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")
root_folder = out_namespace.GetDefaultFolder(6)
out_iter_folder = root_folder.Folders['Email_Snapper']
item_count = out_iter_folder.Items.Count
Flag = False
cnt = 1
if item_count > 0:
for i in range(item_count, 0, -1):
message = out_iter_folder.Items[i]
if EMAIL_SUBJ_SEARCH_STRING in message.Subject and cnt <=1:
cnt=cnt 1
Body_content = message.HTMLBody
Body_content = BeautifulSoup(Body_content,"lxml")
html_tables = Body_content.find_all('table')[0]
#Body_content = Body_content[:Body_content.find("Disclaimer")].strip()
df = pd.read_html(str(html_tables),header=0)[0]
Pkey = [71763307, 76366654, 137292386, 151971418, 151971419, 152547427, 152547246]
df['Pkey'] = Pkey
print(df)
Output: output looks ok until we get a full list of tickers from the bank
But sometimes we only get prices for handful of tickers rather than a full list like below. In that case it is giving error
The error message I get is:
ValueError : Length of values does not match length of index*
CodePudding user response:
Try using pd.series([755454,556554,2545454,54644,878798])