Home > other >  Using Pythons library (Beautiful soup) how do I extract many elements from a website that have not g
Using Pythons library (Beautiful soup) how do I extract many elements from a website that have not g

Time:05-24

This is the website https://interestingengineering.com/real-life-use-every-element-periodic-table

I am trying to extract the "Where its used" data, all the paragraph that explains where the element is used .I have managed to get one. However, I do not know how I can get all of them as their ids have no pattern other than being id paragraph tags with ascending number.

I have also tried to see if there is a numerical pattern, such as does the element structed at every 5 paragraphs or something like that but no luck.

I have checked the documentation to come up with a solution ,However I am new and my documentation skills are at its early stages.

This is my code that gets the one description:

import requests
from bs4 import BeautifulSoup

URL= "https://interestingengineering.com/real-life-use-every-element-periodic-table"

response = requests.get(URL)
website_html = response.text

soup = BeautifulSoup(website_html,"html.parser")

all_descriptions = soup.find_all(id="p-13")

list = [i.text for i in all_descriptions]  

CodePudding user response:

Another option could be:

  • Loop all "p" tags and compare if the "Where It's Used:" text is inside the p tag; if so, add it to a separate list.

Code:

import requests
from bs4 import BeautifulSoup

# Url of the page to get the data: 
url = 'https://interestingengineering.com/real-life-use-every-element-periodic-table'

# List for append the results from the request.
results = []

# Make request: 
soup = BeautifulSoup(requests.get(url).content, 'html.parser') 

# Loop all "p" tags and add to the "result" list only those that matches the criteria: 
for indx, p_tag in enumerate(soup.findAll('p')): 
  if ("Where It's Used:" in p_tag.get_text("", strip=True)): 
    results.append(p_tag.get_text("", strip=True).replace("Where It's Used:", ""))

# Show results: 
print(results)

Results:

['Hydrogen makes up about 90 percent of atoms in the entire universe. The chemical is used heavily as both a gas and liquid fuel. Hydrogen was used as a main fuel for the Space Shuttle program by NASA, as well as currently being used heavily by the petroleum and manufacturing industries.', 'Helium gas is commonly known to be lighter than air, which leads to its use in weather and party balloons. It is also used as an inert shield for arc welding and to pressurize liquid fuel tanks in rockets. Due to its wide recreational usage, natural sources of Helium are at risk of being completely depleted in the next decade, sparking fears for the scientific community.', 'Lithium is known most commonly to be used in batteries. It is also used in aluminum alloys, to make cookware more durable, and most surprisingly, in psychiatric medicines as a mood stabilizer.', 'This element is most commonly used as an alloying agent for copper. When combined, the resultant metal, beryllium copper, is used for springs and a variety of electrical applications. Due to its lightweight metal properties, it is used structurally in the aerospace industry.', 'Boron is used in pyrotechnics. When burned, it gives off a green color in the flame. More common uses are in boric acid and borax. You can find boron in antiseptics, washing chemicals, ceramic glazes, and eye drops.', 'Carbon is unique among the elements in its ability to form strongly bonded chains, sealed off by hydrogen atoms. These hydrocarbons are mostly used as fuels and as a feedstock for the production of polymers, fibres, paints, solvents, and plastics, etc. Impure carbon in the form of charcoal (from wood) and coke (from coal) is used in metal smelting.', "78 percent of Earth's entire atmosphere is made up of nitrogen. The element is significant to the chemical industry as it is a key nutrient in fertilizers and a key component in nitric acid, nylon,\xa0and explosive materials. The Haber process is a well-known method of reacting nitrogen with hydrogen to create ammonia.", 'Many living things,\xa0 including humans, use oxygen for respiration. Pure oxygen is used to treat breathing problems and make spacecraft livable. Oxygen in industry is mainly used in the manufacturing of steel and other metal alloys. Large quantities are also used in the manufacture of chemicals such as nitric acid and hydrogen peroxide.', 'Fluorine is a common additive to drinking water and is used as a cleaning agent in toothpaste. In pop culture, hydrofluoric acid was used as a dissolving agent in the popular TV show "Breaking Bad." The chemical can dissolve glass and is used mainly as an etching compound.', 'Neon is the fourth most abundant element in the entire universe. By far the most prominent use of the element today is used in advertising signs. When enticed with electricity, the glass commonly glows, leading to its use in the respective sign industry as well as high-voltage indicators and lasers.Liquid neon is an important cryogenic refrigerant.', 'Sodium is used in streetlights to produce yellow light as well as being a component in many compounds like table salt, soda ash, borax, and baking soda.', 'Magnesium finds many of its uses in medicine as Epsom salts, milk of magnesia, chloride, and citrate. Magnesium is also essential to both animal and plant life. Because it isless dense than aluminum, it is often alloyed with aluminum for use in plane and car constructions.It is also added to molten iron and steel to remove sulfur.', 'Aluminum is a soft and malleable metal that has uses in cans and fouls, utensils, airplane and automotive parts, and other structural applications.', 'Silicon is used extensively in the semiconductor industry in solid-state electronics. For such applications, the silicon has to be doped with boron, gallium, phosphorus or arsenic.', 'White phosphorus is used in flares and incendiary devices, while red phosphorus is in the material stuck on the side of matchboxes. However,\xa0the largest use of phosphorus compounds is for fertilizers. Phosphorus is also important in the production of steel.', 'Sulfur is used in gunpowder and other pyrotechnics, rubber manufacturing, and as an insecticide, fungicide, and fumigant. It can also be used to treat skin diseases, however, its prime use is in compound separation.', 'Chlorine is used in water treatment and as an antiseptic. During the production of papers, plastics, solvents, and textiles, large amounts of chlorine are also used.', 'Argon is used in incandescent and fluorescent bulbs as a protective layer around the filament to keep oxygen from corroding it. It is also used as a protective shield in arc welding and semiconductor crystals.', 'Potassium is mainly used in compounds. It is combined with chlorine to produce potassium chloride which is used in fertilizers, pharmaceuticals,and saline drips. Potassium hydroxide is also used in soaps and cleaners, whilePotassium carbonate is used in the manufacturing of glass.', 'Calcium is used to prepare thorium and uranium as a reducing agent. It is also used as an alloying agent in aluminum, copper, lead, and magnesium.', 'Used heavily in mercury vapor lamps, Scandium is a\xa0key element in stadium lights. Its radioactive isotope is also used as a tracing agent.', 'Titanium is an incredibly strong metal used in alloys with aluminum, iron, and other metals. This strong metal is used in the aerospace industry as well as engines partly because of its ability to maintain its strength in thermal extremes.', 'This element is used in jet engines and aircraft components. All of its uses require it to be combined with another metal or element, such as Vanadium-gallium tape used in magnets.\xa0About 80 percent of the vanadium produced is used as a steel additive to produce a very tough alloy.', 'Chromium is used in stainless steel as well as in the chrome plating process. Various chromium compounds are known for their vivid colors.', "Manganese dioxide makes up about .14percent of Earth's crust. It is used in glass to remove the green color present in iron compounds. It is too brittle to be used on its own and is mainly used as an alloy.", "Iron's prime use is in making steel. When steel is combined with chromium, it produces stainless steel which is resistant to corrosion.", 'Cobalt is used mostly as a cancer treatment and in radiotherapy.\xa0Cobalt metal is sometimes used in electroplating because of its attractive appearance, hardness, and resistance to corrosion.', 'Nickel is used in stainless steel and other anti-corrosion metal alloys. Other prominent uses include piping and tubing production as well as in the desalination process.', 'Copper is one of the best conductors of electricity, which leads to its use in electronics and motors. The metal is also very thermally conductive, so it is used in radiators, A/C units, and heating systems.', 'Zinc is used as an alloying agent in brass, nickel, silver, and aluminum. Paints, rubbers, cosmetics, batteries, textiles, and inks also have\xa0a significant need for the element.', 'Since gallium has a low melting point, it is often used in medical thermometers as a substitute\xa0for mercury. When combined with arsenic, it is used in semiconductors, lasers, and solar panels. It can also be used in mirror manufacturing.', 'Germanium finds its uses in the semiconductor industry. When it is doped with other elements, it makes highly efficient transistors. Continuing on with its electronic uses, it is also implemented in fluorescent lamps.', 'This element is used as a doping agent in transistors, primarily with Gallium. Many arsenic compounds are used as insecticides and poisons.', 'Primary uses for selenium are in the glass industry. Its properties allow it to decolorize class and make red glass as well. It is used in solar and photocells. In film photography, it is also used as a photographic toner.', 'Bromine is used as a flame-retarder in plastics and electronics. It can also be used to purify and disinfect water, leading to its uses in swimming pools and hot tubs.', "About .0001percent of Earth's atmosphere is krypton, which makes obtaining it relatively difficult. The element is used for flashes in high-speed photography as well as a conductive gas in fluorescent lights. Krypton fluoride is used in some lasers.", 'Rubidium is used in vacuum tubes to remove trace\xa0gases. It also is heavily used in photocells and specialized glasses. It can be ionized easily, so it is often utilized as a propellant in spacecraft.', 'Strontium is used in pyrotechnics to produce brilliant reds. It can also be used in ferrite magnet production and zinc refining.', 'An oxide of yttrium is used to make red phosphorus television tubes. Along with this, it is used to increase the strength in aluminum and magnesium alloys.', 'Zirconium is used as an anti-corrosion compound in pumps and valves. It does not absorb neutrons, so it is also widely used in nuclear reactors.', 'Niobium is used in stainless steel alloys. Alloys produced with Niobium are very strong and are often used in pipelines and jet engines.', 'Molybdenum is used to make alloys used in missile and aircraft parts as well as the nuclear power industry and in heating elements. It can be used to refine petroleum, but its main use is as an alloying agent to refine steel.\xa0Molybdenum disulfide is used as a lubricant additive.', 'Technetium is a synthesized element that can be used as a radioactive tracer.', 'Ruthenium is used as a catalyst to harden metals. It is also used in electrical contacts and to color glass.', 'Rhodium is used to manufacture electrical contacts. This use extends into catalytic converters, but its primary use is as an alloying agent. Alloys of rhodium can be used in furnaces, electrodes, and spark plugs.', 'Palladium is an important element of the catalytic\xa0conversion process. It is also used in jewelry and dental fillings.', 'Silver is used in jewelry and tableware. It is the best reflector of visible light, although it does tarnish. It is used in soldering and brazing compounds as well as batteries. Silver paints are used for making printed circuits.Silver also has antibacterial properties and recently silver nanoparticles have been used in clothing to prevent bacteria from growing and creating unpleasant odors.', 'Cadmium is poisonous, so it has few practical uses. It can be used to prevent corrosion or to absorb neutrons in nuclear reactors. One of its more commercial uses is in rechargeable nickel-cadmium batteries.', 'Indium is primarily used as a doping agent for germanium in the transistor manufacturing process. It is also used to make highly reflective mirrors and low-melting-point alloys.', 'Tin has the ability to be polished to a high degree and is not corrodible. It is mainly used to coat other metals or as an alloy in solder and pewter. Niobium-tin magnets are known for their superconducting abilities.', 'Antimony is mainly used in batteries, cable sheathing, and other metal products. It can be used to make flame-proof materials and paints. Ancient Egyptians\xa0used the element as black eye make-up.', 'Tellurium allows better machinability of copper and stainless steel. It is used as a basic component of cast iron and blasting caps.', 'Iodine salts are used in photographic film and as an antiseptic for wounds. The radioactive isotope iodine-131 is used to treat thyroid cancer. It is often added in small amounts to table salt, in order to avoid iodine deficiency.', 'Xenon is used in photographic flashes and arc lamps for movie sets. When pressurized in an arc lamp, it can produce UV light. It is also used for radiation detection and in X-ray counters.', "Caesium is used in vacuum tubes to remove trace gasses. It's most common use is as a compound in drilling fluid. One of its most important uses is in the ‘caesium clock’ (atomic clock) and as a catalyst to the process of hydrogenation.", 'Barium is used to produce a green glow in pyrotechnics. It is also used to remove gases from vacuum tubes. Compounds of barium are used as a contrast medium in X-rays.', 'Lanthanum is used along with rare earth elements to make arc lights. It also makes up about 20percent of mischmetal, an alloy used in the flint of cigarette lighters.A lanthanum-nickel alloy is used to store hydrogen gas for use in hydrogen-powered vehicles and Lanthanum is also used in nickel metal hydride batteries.', 'Cerium is also used as a component of mischmetal to produce flint for lighters. It can be used as a catalyst to refine oil. Cerium oxide is also used as a component of walls in self-cleaning ovens.', 'Praseodymium is used to make yellow glass goggles for welders. Praseodymium is also used in flint lighter products. Its main use, however, is to color glass and enamels.', "The most important use for neodymium is in an alloy with iron and boron to make very strong permanent magnets.\xa0Neodymium is used to make flint for lighters\xa0and is a component of specialized welder's goggles. Neodymium glass is used to make lasers, while Neodymium oxide and nitrate are used as catalysts in polymerisation reactions.", 'Promethium is used mainly for research in radiation. It can be used in nuclear batteries and as a light source for signals. Researchers believe that it could soon be used in portable x-ray machines.', 'Samarium is used as a catalyst for dehydration and dehydrogenation of ethanol fuels. It can also be used to absorb infrared light rays and in the treatment of cancer.', 'Europium is a good absorber of neutrons, so it is often used in nuclear reactors. One of its compounds is also used in the production of red phosphorus in television sets.', 'Gadolinium is often used in applications where microwaves are present. It can also be used in green phosphor television tubes. The element is magnetic, which has led to its use in MRI machines.', 'Terbium is used as a stabilizer of high-temperature fuel cells. Its alloys are also used in electronic devices and as magnetic field indicators.', 'When combined with rare earth elements, Dysprosium is used as a laser material. It can also be used in nuclear reactor rods.', 'Holmium is used in the production of magnets as a flux concentrator. It is also used as a yellow or red color in cubic zirconia manufacturing.', 'Erbium is a good neutron absorber, leading to its use in nuclear control rods. It can also be used to reduce the hardness of metals along with applications in amps and lasers.', 'Thulium is the least naturally-occurring element on earth.When irradiated, thulium produces an isotope that emits x-rays and can be used to make a lightweight, portable x-ray machine. Thulium is also used in some surgical lasers.', 'Ytterbium is believed to be useable in grain refinement within steel.It can also be used as an industrial catalyst.', 'Lutetium is very rare and high in price. When refined, it can be used in the petroleum cracking process. There are few other commercial applications.', 'Hafnium is a good neutron absorber, so it is used in nuclear control rods in nuclear submarines. Due to its high melting point, it is also used in plasma welding torches. Hafnium oxide is used in microchips.', 'Tantalum is used in the electronics industry for capacitors and resistors. It can be used to increase strength in metal alloys as well as increase corrosion resistance. The metal is also used in surgical instruments because it causes no immune response.', 'Tungsten has the highest melting point of all metals, leading to its use as filaments in incandescent bulbs. It is also used in steel to impart strength.', 'Rhenium is a common catalyst in the production of high-octane gasoline. It is also used in alloys for jet engines and as filaments for mass spectrographs.', 'Osmium is mainly used to make hard metal alloys. You can find it in ballpoint pen tips, record needles, electrical contacts, and other metal components where friction needs to be mitigated.', 'Iridium is mostly used as a hardening agent for platinum. This element is also used as an alloy in fountain pen tips and compass bearings, andfor the contacts in spark plugs.', 'Platinum is known for its corrosion-resistant properties and has long been used for jewelry. Its main use is in catalytic converters for automobiles.', 'Gold is one of the most coveted metals in the world, because it can be easily shaped and sculpted, conducts electricity well, and does not tarnish. Aside from its use in coinage and jewelry, it\xa0is used in gears for watches, artificial limb joints, and electrical connectors. Gold nanoparticles are used as industrial catalysts.', 'Mercury is used to making thermometers, barometers, electrical switches, and other instruments. It is often used in streetlights and fluorescent lamps andin the chemical industry as a catalyst.', 'Thallium is used to form low-melting-point glass. It was once used as rat poison, but it is now banned from household use.', 'Many previously common uses of Lead have now been banned, due to its toxic effects.It is still widely used for car batteries, pigments, ammunition, cable sheathing, lead crystal glass, radiation protection, and in some solders.', 'Bismuth is usually used in fire detectors and fire extinguishment systems due to its low melting point. This has also led to its use in electrical fuses.', 'Polonium is used as an atomic heat source for short term use. It is also seen in anti-static brushes and in film.', 'Astatine is used as a radioactive tracer and in cancer treatment.', 'Radon is used to treat cancer. It was often produced in hospitals by pumping radon from radium and then sealing it into tubes.', 'The most stable isotope of francium, francium-223, has a half-life of 22 minutes. Due to its short lifespan, this element has no commercial uses.', 'Radium is used as a neutron source and is also used to produce radon. One gram of radium-226 will make .0001 mL of radon each day. The element is 1,000,000 times more active than uranium.', 'Actinium is used in medicine for radio-immunotherapy. It is only found in uranium ore, which makes it very expensive. One ton of uranium produces the equivalent of 1/10th of a gram of actinium.', 'Thorium is used to coat filaments in incandescent bulbs. It can be used as nuclear fuel in Thorium reactors, although this is a very new technology.Thorium is also an alloying agent in magnesium, and Thorium oxide is used as an industrial catalyst.', 'There are currently no commercial uses for protactinium due to its relative rarity.', 'Uranium is used as nuclear fuel for nuclear power reactors and produces the material needed for nuclear weapons. It is also used as a colorant for glass. It is also the major material from which other synthetic transuranium elements are made.', 'Neptunium does not have any known commercial uses.', 'Plutonium is used as a nuclear fuel and in nuclear weapons.', 'Americium is used in smoke detectors and as a portable source of gamma rays.', 'Curium is mainly used for research, but\xa0in the future, it could produce more nuclear energy per gram than plutonium.', 'Berkelium has no commercial uses due to its rarity.', 'Californium is a strong neutron emitter. It is used in metal detectors for silver and gold. It also can be used to identify oil layers underground and detect metal fatigue in aerospace applications.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.', 'No uses outside research.']

CodePudding user response:

Simply find the elements by its text with css selectors:

soup.select('p:-soup-contains("Where It\'s Used:")')
Example:
import requests
from bs4 import BeautifulSoup

URL= "https://interestingengineering.com/real-life-use-every-element-periodic-table"

response = requests.get(URL)
website_html = response.text

soup = BeautifulSoup(website_html)

soup.select('p:-soup-contains("Where It\'s Used:")')

or select the <strong> and get the text of its next sibling:

[x.next_sibling.text for x in soup.select('strong:-soup-contains("Where It\'s Used:")')]

->
['Hydrogen makes up about 90 percent of atoms in the entire universe. The chemical is used heavily as both a gas and liquid fuel. Hydrogen was used as a main fuel for the Space Shuttle program by NASA, as well as currently being used heavily by the petroleum and manufacturing industries.',
 'Helium gas is commonly known to be lighter than air, which leads to its use in weather and party balloons. It is also used as an inert shield for arc welding and to pressurize liquid fuel tanks in rockets. Due to its wide recreational usage, natural sources of Helium are at risk of being completely depleted in the next decade, sparking fears for the scientific community.',
 'Lithium is known most commonly to be used in batteries. It is also used in aluminum alloys, to make cookware more durable, and most surprisingly, in psychiatric medicines as a mood stabilizer.',
 'This element is most commonly used as an alloying agent for copper. When combined, the resultant metal, beryllium copper, is used for springs and a variety of electrical applications. Due to its lightweight metal properties, it is used structurally in the aerospace industry.',...]

To get all the elements with its attributes as list of dicts you could go with:

...
soup = BeautifulSoup(website_html)
data = []

for e in soup.select('h2'):
    d = []
    for p in e.find_next_siblings():
        if p.name in ['div','h2']:
            break
        elif p.strong and len(tuple(p.stripped_strings)) == 2:
            d.append(tuple(p.text.replace('\xa0','').split(':')))
    if d:
        data.append(dict(x for x in d ))
data
Output
[{'Symbol': ' H',
  'Atomic Weight': '1.008',
  'Description': ' Hydrogen is anexplosive gas and also the lightest element.',
  "Where It's Used": 'Hydrogen makes up about 90 percent of atoms in the entire universe. The chemical is used heavily as both a gas and liquid fuel. Hydrogen was used as a main fuel for the Space Shuttle program by NASA, as well as currently being used heavily by the petroleum and manufacturing industries.'},
 {'Symbol': ' He',
  'Atomic Weight': '4.002602(2)',
  'Description': ' Helium is an inert gas and the second-lightest element.',
  "Where It's Used": 'Helium gas is commonly known to be lighter than air, which leads to its use in weather and party balloons. It is also used as an inert shield for arc welding and to pressurize liquid fuel tanks in rockets. Due to its wide recreational usage, natural sources of Helium are at risk of being completely depleted in the next decade, sparking fears for the scientific community.'},...]
  • Related