Home > OS >  Populate new additional column in dataframe based on a certain match
Populate new additional column in dataframe based on a certain match

Time:06-20

I am trying to read HTML table from outlook application using beautifulsoup. The table contains two main columns: Ticker and price. Now I am trying to add a third column named as Pkey to the existing dataframe.

I am able to add it tough and it works fine till the email has a full list of tickers (7 in total). In case sometimes we don't receive a full list of tickers, say from 7 we receive prices for only 3 tickers, then in column 3, I need Pkeys against those 3 tickers.

How is that possible?

We have the following code:

import pandas as pd
import win32com.client
from sqlalchemy.engine import create_engine
import re
from datetime import datetime, timedelta
import requests
import sys
from bs4 import BeautifulSoup
from pprint import pprint


EMAIL_ACCOUNT = 'robinhood.gmail.com'
EMAIL_SUBJ_SEARCH_STRING = 'Morgan Stanley Systematic Strategies Daily Levels'


out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")


root_folder = out_namespace.GetDefaultFolder(6)

out_iter_folder = root_folder.Folders['Email_Snapper']

item_count = out_iter_folder.Items.Count

Flag = False
cnt = 1
if item_count > 0:
    for i in range(item_count, 0, -1):
        message = out_iter_folder.Items[i]
        if EMAIL_SUBJ_SEARCH_STRING in message.Subject and cnt <=1:
            cnt=cnt 1
            Body_content = message.HTMLBody
            Body_content = BeautifulSoup(Body_content,"lxml")
            html_tables = Body_content.find_all('table')[0]
            #Body_content = Body_content[:Body_content.find("Disclaimer")].strip()
            df = pd.read_html(str(html_tables),header=0)[0]
            Pkey = [71763307, 76366654, 137292386, 151971418, 151971419, 152547427, 152547246]
            df['Pkey'] = Pkey
            
            print(df) 

Output: output looks ok until we get a full list of tickers from the bank

enter image description here

But sometimes we only get prices for handful of tickers rather than a full list like below. In that case it is giving error

enter image description here

The error message I get is:

ValueError : Length of values does not match length of index*

CodePudding user response:

Try using pd.series([755454,556554,2545454,54644,878798])

  • Related