I am trying to scrape data from:-
You could also get separate columns for each tablet-count-option, if you remove
pDet[pvrStrength] = ', '.join([
pvOp.get_text(' ').strip() for pvOp in pvRow.select(opSel)
])
and replace it with this loop:
for pvoi, pvOp in enumerate(pvRow.select(opSel)):
pvoTxt = pvOp.get_text(' ').strip()
tabletCt = pvoTxt.split(' - ')[0]
pvoPrice = pvoTxt.split(' - ')[-1]
if not tabletCt.endswith(' tablets'):
tabletCt = f'[option {pvoi 1}]'
pvoPrice = pvoTxt
pDet[f'{pvrStrength} - {tabletCt}'] = pvoPrice
index | Abilify (Aripiprazole) | Generic Equivalent - Abilify (Aripiprazole) | Generic Equivalent - Accolate (Zafirlukast) | Abilify ODT (Aripiprazole) | Generic Equivalent - Abilify ODT (Aripiprazole) |
---|---|---|---|---|---|
product_endpt | abilify-tablet | abilify-tablet | accolate | abilify-mt | abilify-mt |
brand_or_generic | Brand | Generic | Generic | Brand | Generic |
rx_requirement | Prescription Required | NaN | NaN | Prescription Required | NaN |
2mg - 30 tablets | $219.99 | NaN | NaN | NaN | NaN |
2mg - 90 tablets | $526.99 | NaN | NaN | NaN | NaN |
5mg - 28 tablets | $160.99 | NaN | NaN | NaN | NaN |
5mg - 84 tablets | $459.99 | NaN | NaN | NaN | NaN |
10mg - 28 tablets | $116.99 | NaN | NaN | NaN | NaN |
10mg - 84 tablets | $162.99 | NaN | NaN | NaN | NaN |
15mg - 28 tablets | $159.99 | NaN | NaN | NaN | NaN |
15mg - 84 tablets | $198.99 | NaN | NaN | NaN | NaN |
20mg - 90 tablets | $745.99 | $67.99 | NaN | NaN | NaN |
30mg - 28 tablets | $104.99 | NaN | NaN | NaN | NaN |
30mg - 84 tablets | $289.99 | $75.99 | NaN | NaN | NaN |
1mg/ml Solution - [option 1] | 150 ml - $239.99 | NaN | NaN | NaN | NaN |
2mg - 100 tablets | NaN | $98.99 | NaN | NaN | NaN |
5mg - 100 tablets | NaN | $43.99 | NaN | NaN | NaN |
10mg - 90 tablets | NaN | $38.59 | NaN | NaN | NaN |
15mg - 90 tablets | NaN | $56.59 | NaN | NaN | NaN |
10mg - 60 tablets | NaN | NaN | $109.00 | NaN | NaN |
20mg - 60 tablets | NaN | NaN | $109.00 | NaN | NaN |
10mg ODT - 84 tablets | NaN | NaN | NaN | $499.99 | NaN |
15mg ODT - 84 tablets | NaN | NaN | NaN | $499.99 | NaN |
5mg ODT - 90 tablets | NaN | NaN | NaN | NaN | $59.00 |
20mg ODT - 90 tablets | NaN | NaN | NaN | NaN | $89.00 |
30mg ODT - 150 tablets | NaN | NaN | NaN | NaN | $129.99 |
source_url | https://www.canadapharmacy.com/products/abilify-tablet | https://www.canadapharmacy.com/products/abilify-tablet | https://www.canadapharmacy.com/products/accolate | https://www.canadapharmacy.com/products/abilify-mt | https://www.canadapharmacy.com/products/abilify-mt |
(I transposed the table since there were so many columns and so few rows. Table markdown can be copied from output of print(pricesDf.T.to_markdown())
)