Im not able to get the current field "type" using BeautifulSoup.
Current code prints blank for "type" variable
from bs4 import BeautifulSoup
import requests
url='https://ash.confex.com/ash/2021/webprogram/Session20851.html'
res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')
content=soup.find_all('div',class_='paper')
for property in content:
title=property.find('div',class_='cricon').text
type=property.find("div",{"id":"info"})
CodePudding user response:
As you can see here, this is looks like a "property" variable, during each iteration of content.
<div >
<div >9:30 AM</div>
<div ><a href="Paper146905.html">7</a></div>
<div >
<div ><a href="Paper146905.html">Sustained Improvements in Patient-Reported Quality of Life up to 24 Months Post-Treatment with LentiGlobin for Sickle Cell Disease (bb1111) Gene Therapy</a></div>
<span >
<p ><b>Mark C. Walters, MD</b><sup>1</sup>, John F. Tisdale, MD<sup>2</sup><sup>*</sup>, Markus Y. Mapara, MD, PhD<sup>3</sup>, Lakshmanan Krishnamurti, MD<sup>4</sup>, Janet L. Kwiatkowski, MD, MSCE<sup>5,6</sup>, Banu Aygun, MD<sup>7</sup>, Kimberly A. Kasow, DO<sup>8</sup><sup>*</sup>, Stacey Rifkin-Zenenberg, DO<sup>9</sup>, Jennifer Jaroscak, MD<sup>10</sup>, Diana Garbinsky, MS<sup>11</sup><sup>*</sup>, Costel Chirila, PhD<sup>11</sup><sup>*</sup>, Meghan E. Gallagher, MSc<sup>12</sup><sup>*</sup>, Xinyan Zhang, PhD<sup>12</sup><sup>*</sup>, Pei-Ran Ho, MD<sup>12</sup><sup>*</sup>, Alexis A. Thompson, MD, MPH<sup>13,14</sup> and Julie Kanter, MD<sup>15</sup></p><p ><sup>1</sup>Division of Hematology, UCSF Benioff Children's Hospital Oakland, Oakland, CA<br/><sup>2</sup>Cellular and Molecular Therapeutics Branch NHLBI/NIDDK, National Institutes of Health, Bethesda, MD<br/><sup>3</sup>Division of Hematology/Oncology, Columbia Center for Translational Immunology, Columbia University Medical Center, New York, NY<br/><sup>4</sup>Aflac Cancer and Blood Disorders Center, Department of Pediatrics, Emory Healthcare, Atlanta, GA<br/><sup>5</sup>Division of Hematology, Children's Hospital of Philadephia, Philadelphia, PA<br/><sup>6</sup>Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA<br/><sup>7</sup>Cohen Children’s Medical Center, Queens, NY<br/><sup>8</sup>University of North Carolina, Chapel Hill<br/><sup>9</sup>Hackensack University Medical Center, Hackensack, NJ<br/><sup>10</sup>University Medical Center, Medical University of South Carolina Health, Charleston, SC<br/><sup>11</sup>RTI Health Solutions, Research Triangle Park, NC<br/><sup>12</sup>bluebird bio, Inc., Cambridge, MA<br/><sup>13</sup>Feinberg School of Medicine, Northwestern University, Chicago, IL<br/><sup>14</sup>Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL<br/><sup>15</sup>University of Alabama Birmingham, Birmingham, AL</p>
</span>
<div ></div>
<div >
</div>
</div>
</div>
In other words, you are iterating over each event, but you need to get only the header div, called "info" in ID.
this should work for you...
from bs4 import BeautifulSoup
import requests
url='https://ash.confex.com/ash/2021/webprogram/Session20851.html'
res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')
content=soup.find_all('div',class_='paper')
info = soup.find_all('div',class_ = 'datetime')
type = soup.find("span", string="Type:").next_sibling
for property in content:
title=property.find('div',class_='cricon').text
print(title, type, sep = "\n", end = "\n\n")
OUTPUT
Sustained Improvements in Patient-Reported Quality of Life up to 24 Months Post-Treatment with LentiGlobin for Sickle Cell Disease (bb1111) Gene Therapy
Oral
Activation of Pyruvate Kinase-R with Etavopivat (FT-4202) Is Well Tolerated, Improves Anemia, and Decreases Intravascular Hemolysis in Patients with Sickle Cell Disease Treated for up to 12 Weeks
Oral
Etavopivat, an Allosteric Activator of Pyruvate Kinase-R, Improves Sickle RBC Functional Health and Survival and Reduces Systemic Markers of Inflammation and Hypercoagulability in Patients with Sickle Cell Disease: An Analysis of Exploratory Studies in a Phase 1 Study
Oral
Mitapivat (AG-348) Demonstrates Safety, Tolerability, and Improvements in Anemia, Hemolysis, Oxygen Affinity, and Hemoglobin S Polymerization Kinetics in Adults with Sickle Cell Disease: A Phase 1 Dose Escalation Study
Oral
Hydroxyurea Reduces the Transfusion Burden in Children with Sickle Cell Anemia: The Reach Experience
Oral
Initial Safety and Efficacy Results from the Phase II, Multicenter, Open-Label Solace-Kids Trial of Crizanlizumab in Adolescents with Sickle Cell Disease (SCD)
Oral
CodePudding user response:
@Void S, You also can do that using if else
statement as follows:
from bs4 import BeautifulSoup
import requests
url = 'https://ash.confex.com/ash/2021/webprogram/Session20851.html'
res = requests.get(url)
soup = BeautifulSoup(res.content, 'html.parser')
content = soup.find_all('div', class_='paper')
for property in content:
title = property.find('div', class_='cricon').text
type = property.find("div", {"id": "info"}).text if property.find("div", {"id": "info"}) else "oral"
print('title:' str(title) ,'type:' str(type),sep='\n', end = '\n\n')
Output:
title:Sustained Improvements in Patient-Reported Quality of Life up to 24 Months Post-Treatment with LentiGlobin for Sickle Cell Disease (bb1111) Gene Therapy
type:oral
title:Activation of Pyruvate Kinase-R with Etavopivat (FT-4202) Is Well Tolerated, Improves Anemia, and Decreases Intravascular Hemolysis in Patients with Sickle Cell Disease Treated for up to 12 Weeks
type:oral
title:Etavopivat, an Allosteric Activator of Pyruvate Kinase-R, Improves Sickle RBC Functional Health and Survival and Reduces Systemic Markers of Inflammation and Hypercoagulability in Patients with Sickle Cell Disease: An Analysis of Exploratory Studies in a Phase 1 Study
type:oral
title:Mitapivat (AG-348) Demonstrates Safety, Tolerability, and Improvements in Anemia, Hemolysis, Oxygen Affinity,
and Hemoglobin S Polymerization Kinetics in Adults with Sickle Cell Disease: A Phase 1 Dose Escalation Study
type:oral
title:Hydroxyurea Reduces the Transfusion Burden in Children with Sickle Cell Anemia: The Reach Experience
type:oral
title:Initial Safety and Efficacy Results from the Phase II, Multicenter, Open-Label Solace-Kids Trial of Crizanlizumab in Adolescents with Sickle Cell Disease (SCD)
type:oral
(scrapyEnv) F:\stackOverflow_answer\stackoverflow-03>python ama.py
title:Sustained Improvements in Patient-Reported Quality of Life up to 24 Months Post-Treatment with LentiGlobin for Sickle Cell Disease (bb1111) Gene Therapy
type:oral
title:Activation of Pyruvate Kinase-R with Etavopivat (FT-4202) Is Well Tolerated, Improves Anemia, and Decreases Intravascular Hemolysis in Patients with Sickle Cell Disease Treated for up to 12 Weeks
type:oral
title:Etavopivat, an Allosteric Activator of Pyruvate Kinase-R, Improves Sickle RBC Functional Health and Survival and Reduces Systemic Markers of Inflammation and Hypercoagulability in Patients with Sickle Cell Disease: An Analysis of Exploratory Studies in a Phase 1 Study
type:oral
title:Mitapivat (AG-348) Demonstrates Safety, Tolerability, and Improvements in Anemia, Hemolysis, Oxygen Affinity,
and Hemoglobin S Polymerization Kinetics in Adults with Sickle Cell Disease: A Phase 1 Dose Escalation Study
type:oral
title:Hydroxyurea Reduces the Transfusion Burden in Children with Sickle Cell Anemia: The Reach Experience
type:oral
title:Initial Safety and Efficacy Results from the Phase II, Multicenter, Open-Label Solace-Kids Trial of Crizanlizumab in Adolescents with Sickle Cell Disease (SCD)
type:oral