Home > front end >  Getting error encoded' pseudo-class is not implemented at this time with beautiful soup
Getting error encoded' pseudo-class is not implemented at this time with beautiful soup

Time:04-27

I am trying to extract the table element within the content:encoded tag while extracting the content of XML file using pythons beautiful soup.

Getting the below error

An error occurred  ':encoded' pseudo-class is not implemented at this time

See My code below

import bs4

content_html_list = []


def main():
    try:
       #first get the xml file content and pass it to string variable called "res_text"
        soup = bs4.BeautifulSoup(res_text, features="xml")

        content_html_tag_list = soup.select('content:encoded')
        for content_htmls in content_html_tag_list:
            content_html = content_htmls.text
            content_html_list.append(content_html)

        print(f"content_html_list 0 is, {len(content_html_list)}")

        print(f"content_html_list = , {content_html_list}")

    except Exception as e:
        print(f'An error occurred  {str(e)}')


main()

see the xml below

<item>
            <content:encoded>
                <![CDATA[ <TABLE BORDER=0 WIDTH='100%'><TR><TD><table><tr><td>Funding Opportunity ID: </td><td>335905</td></tr><tr><td>Opportunity Number: </td><td>HHS-2022-ACF-OPRE-YE-0106</td></tr><tr><td>Opportunity Title:</td><td>Child Care Policy Research Partnership Grants</td></tr><tr><td>Opportunity Category:</td><td>Discretionary</td></tr><tr><td>Opportunity Category Explanation:</td><td></td></tr><tr><td valign='top'>Funding Instrument Type: </td><td>Cooperative Agreement</td></tr><tr><td valign='top'>Category of Funding Activity: </td><td>Income Security and Social Services</td></tr><tr><td valign='top'>Category Explanation: </td><td></td></tr><tr><td valign='top'>CFDA Number(s): </td><td>93.575</td></tr><tr><td valign='top'>Eligible Applicants:</td><td>State governments<br>County governments<br>City or township governments<br>Special district governments<br>Independent school districts<br>Public and State controlled institutions of higher education<br>Native American tribal governments (Federally recognized)<br>Public housing authorities/Indian housing authorities<br>Native American tribal organizations (other than Federally recognized tribal governments)<br>Nonprofits having a 501(c)(3) status with the IRS, other than institutions of higher education<br>Nonprofits that do not have a 501(c)(3) status with the IRS, other than institutions of higher education<br>Private institutions of higher education<br>For profit organizations other than small businesses<br>Small businesses<br>Others (see text field entitled "Additional Information on Eligibility" for clarification)</td></tr><tr><td valign='top'>Additional Information on Eligibility:</td><td>The applicant eligibility is unrestricted. Applications from individuals (including sole proprietorships) and foreign entities are not eligible and will be disqualified from competitive review and from funding under this funding opportunity announcement. Faith-based and community organizations that meet the eligibility requirements are eligible to receive awards under this funding opportunity. Faith-based organizations may apply for this award on the same basis as any other organization, as set forth at and, subject to the protections and requirements of 45 CFR Part 87 and 42 U.S.C. &#167; 2000bb&#160;et seq., ACF will not, in the selection of recipients, discriminate against an organization on the basis of the organization&apos;s religious character, affiliation, or exercise.</td></tr><tr><td valign='top'>Agency Code:</td><td>HHS-ACF-OPRE</td></tr><tr><td valign='top'>Agency Name:</td><td>Department of Health and Human Services<br>Administration for Children and Families - OPRE</td></tr><tr><td>Posted Date:</td><td>Mar 01, 2022</td></tr><tr><td>Close Date:</td><td>Jun 10, 2022 Electronically submitted applications must be submitted no later than 11:59 pm Eastern Standard Time on the listed application due date.</td></tr><tr><td>Last Updated Date:</td><td>Mar 01, 2022</td></tr><tr><td>Award Ceiling:</td><td>$400,000</td></tr><tr><td>Award Floor:</td><td>$100,000</td></tr><tr><td>Estimated Total Program Funding:</td><td>$3,200,000</td></tr><tr><td>Expected Number of Awards:</td><td>8</td></tr><tr><td>Description:</td><td>The Administration for Children and Families (ACF) plans to solicit applications for Child Care Policy Research Partnership (CCPRP) Grants. These four-year cooperative agreements will support partnerships between Child Care and Development Fund (CCDF) Lead Agencies in states, territories, or tribes and institutions with demonstrated research capacity to develop rigorous investigations of child care subsidy policies and practices. Sponsored projects will inform local and federal understanding about the efficacy of child care subsidy policies and practices to increase low-income families&#8217; access to quality child care. To ensure that the funded work is timely and relevant to the current child care context, projects are expected to be collaborative from start to finish. The CCDF Lead Agency and their research partners must work together throughout all phases of the project and are encouraged to engage other interested parties, as appropriate. This iteration of the CCPRP Grants Program will prioritize research projects exploring (1) evidence-informed approaches to measuring quality across different provider types and (2) approaches to building the supply of high-quality child care through targeted investments in the early childhood workforce. Sponsored projects will be expected to participate in a consortium that will meet and communicate regularly to identify opportunities for coordination, such as common data elements and research methods, and to develop collective expertise and resources for the field. The consortium&#8217;s collaboration will support research capacity and learning within individual projects and across recipients. For further information about prior awards made for CCPRP Grants, see https://www.acf.hhs.gov/opre/project/child-care-policy-research-partnerships-1995-2023.</td></tr><tr><td>Version:</td><td>1</td></tr></table></TD></TR></TABLE> ]]>
            </content:encoded>
            <dc:date>2022-04-20T17:15:42Z</dc:date>
        </item>

Please advice on the best way to extract the text

CodePudding user response:

Escape the : so it is not viewed as pseudo-class and use parser 'lxml'

soup.select('content\:encoded')

Example:

from bs4 import BeautifulSoup as bs

s = '''
<item>
    <content:encoded>
        <![CDATA[ <TABLE BORDER=0 WIDTH='100%'><TR><TD><table><tr><td>Funding Opportunity ID: </td><td>335905</td></tr><tr><td>Opportunity Number: </td><td>HHS-2022-ACF-OPRE-YE-0106</td></tr><tr><td>Opportunity Title:</td><td>Child Care
        Policy Research Partnership Grants</td></tr><tr><td>Opportunity Category:</td><td>Discretionary</td></tr><tr><td>Opportunity Category Explanation:</td><td></td></tr><tr><td valign='top'>Funding Instrument Type: </td><td>Cooperative
        Agreement</td></tr><tr><td valign='top'>Category of Funding Activity: </td><td>Income Security and Social Services</td></tr><tr><td valign='top'>Category Explanation: </td><td></td></tr><tr><td valign='top'>CFDA Number(s):
        </td><td>93.575</td></tr><tr><td valign='top'>Eligible Applicants:</td><td>State governments<br>County governments<br>City or township governments<br>Special district governments<br>Independent school districts<br>Public and State
        controlled institutions of higher education<br>Native American tribal governments (Federally recognized)<br>Public housing authorities/Indian housing authorities<br>Native American tribal organizations (other than Federally
        recognized tribal governments)<br>Nonprofits having a 501(c)(3) status with the IRS, other than institutions of higher education<br>Nonprofits that do not have a 501(c)(3) status with the IRS, other than institutions of higher
        education<br>Private institutions of higher education<br>For profit organizations other than small businesses<br>Small businesses<br>Others (see text field entitled "Additional Information on Eligibility" for
        clarification)</td></tr><tr><td valign='top'>Additional Information on Eligibility:</td><td>The applicant eligibility is unrestricted. Applications from individuals (including sole proprietorships) and foreign entities are not
        eligible and will be disqualified from competitive review and from funding under this funding opportunity announcement. Faith-based and community organizations that meet the eligibility requirements are eligible to receive awards
        under this funding opportunity. Faith-based organizations may apply for this award on the same basis as any other organization, as set forth at and, subject to the protections and requirements of 45 CFR Part 87 and 42 U.S.C. &#167;
        2000bb&#160;et seq., ACF will not, in the selection of recipients, discriminate against an organization on the basis of the organization&apos;s religious character, affiliation, or exercise.</td></tr><tr><td valign='top'>Agency
        Code:</td><td>HHS-ACF-OPRE</td></tr><tr><td valign='top'>Agency Name:</td><td>Department of Health and Human Services<br>Administration for Children and Families - OPRE</td></tr><tr><td>Posted Date:</td><td>Mar 01,
        2022</td></tr><tr><td>Close Date:</td><td>Jun 10, 2022 Electronically submitted applications must be submitted no later than 11:59 pm Eastern Standard Time on the listed application due date.</td></tr><tr><td>Last Updated
        Date:</td><td>Mar 01, 2022</td></tr><tr><td>Award Ceiling:</td><td>$400,000</td></tr><tr><td>Award Floor:</td><td>$100,000</td></tr><tr><td>Estimated Total Program Funding:</td><td>$3,200,000</td></tr><tr><td>Expected Number of
        Awards:</td><td>8</td></tr><tr><td>Description:</td><td>The Administration for Children and Families (ACF) plans to solicit applications for Child Care Policy Research Partnership (CCPRP) Grants. These four-year cooperative
        agreements will support partnerships between Child Care and Development Fund (CCDF) Lead Agencies in states, territories, or tribes and institutions with demonstrated research capacity to develop rigorous investigations of child
        care subsidy policies and practices. Sponsored projects will inform local and federal understanding about the efficacy of child care subsidy policies and practices to increase low-income families&#8217; access to quality child care.
        To ensure that the funded work is timely and relevant to the current child care context, projects are expected to be collaborative from start to finish. The CCDF Lead Agency and their research partners must work together throughout
        all phases of the project and are encouraged to engage other interested parties, as appropriate. This iteration of the CCPRP Grants Program will prioritize research projects exploring (1) evidence-informed approaches to measuring
        quality across different provider types and (2) approaches to building the supply of high-quality child care through targeted investments in the early childhood workforce. Sponsored projects will be expected to participate in a
        consortium that will meet and communicate regularly to identify opportunities for coordination, such as common data elements and research methods, and to develop collective expertise and resources for the field. The
        consortium&#8217;s collaboration will support research capacity and learning within individual projects and across recipients. For further information about prior awards made for CCPRP Grants, see
        https://www.acf.hhs.gov/opre/project/child-care-policy-research-partnerships-1995-2023.</td></tr><tr><td>Version:</td><td>1</td></tr></table></TD></TR></TABLE> ]]>
    </content:encoded>
    <dc:date>2022-04-20T17:15:42Z</dc:date>
</item>
'''

soup = bs(s, 'lxml')
soup.select('content\:encoded')
  • Related