Home > Blockchain >  How to collect data generated from a form using python requests?
How to collect data generated from a form using python requests?

Time:01-14

I'm doing some web scraping with form data, and I've run into a situation that I can't handle.

I need to get a table that is generated from a form with some options, as shown in the image below:

enter image description here

The website is this: https://aplicacoes.mds.gov.br/cadsuas/pesquisarConsultaExterna.html

For this, I tried to develop a small script, according to the code below:

import pandas as pd
import requests
tipo = 'Rede Socioassistencial'
uf = 'PR'
municipio = 'Campo Largo'

url = 'https://aplicacoes.mds.gov.br/cadsuas/pesquisarConsultaExterna.html'
payload = {
'consultaExternaHelper.tipoBusca':'%s' %tipo,
'consultaExternaHelper.endereco.municipio.uf.sigla': '%s' %uf,
'consultaExternaHelper.endereco.municipio.id': '%s' %municipio}

response = requests.post(url, params=payload)
df = pd.read_html(response.text)

However, I have no experience with this type of application and, therefore, the result obtained is far from what was expected, as can be seen:

[                                                   0          1          2  \
0                                         Bem vindo!        NaN        NaN   
1  O CadSUAS é o sistema de cadastro do SUAS, que...        NaN        NaN   
2                                          PESQUISAR  PESQUISAR  PESQUISAR   

           3          4          5  
0        NaN        NaN        NaN  
1        NaN        NaN        NaN  
2  PESQUISAR  PESQUISAR  PESQUISAR  ,                                                    0  \
0  Tipo de Busca: Rede Socioassistencial Órgãos G...   
1                                              * UF:   
2                                                CPF   
3                                              Tipo:   
4                                                NaN   

                                                   1  \
0  Tipo de Busca: Rede Socioassistencial Órgãos G...   
1  Selecionar  AC  AL  AM  AP  BA  CE  DF  ES  GO...   
2                                              Nome:   
3                            Selecionar  CRAS  CREAS   
4                                                NaN   

                                                   2  \
0  Tipo de Busca: Rede Socioassistencial Órgãos G...   
1                                         Município:   
2                                                NaN   
3                                       Possui CEAS:   
4                                                NaN   

                                                   3   4   5   6   7  
0  Tipo de Busca: Rede Socioassistencial Órgãos G... NaN NaN NaN NaN  
1  Selecionar  ABATIA  ADRIANOPOLIS  AGUDOS DO SU... NaN NaN NaN NaN  
2                                                NaN NaN NaN NaN NaN  
3                          Todas  Com CEAS  Sem CEAS NaN NaN NaN NaN  
4                                                NaN NaN NaN NaN NaN  ,                                                    0
0  ACESSAR AREA RESTRITA - Sr. Gestor, clique aqu...
1  Versão 3.14.4 © 2008 Ministério do Desenvolvim...]

As I reported, I'm still practicing, so I must certainly be forgetting some detail or using an option that is not the most suitable.

Thanks if anyone has any alternatives to this issue.

CodePudding user response:

To extract the table use code bellow. What I fixed:

  • Pass payload to requests.post as form data (not as url param)
  • Extract only one table#entidadeList from html (I used Beautiful Soup for this)
import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://aplicacoes.mds.gov.br/cadsuas/pesquisarConsultaExterna.html'

payload = {
    "consultaExternaHelper.tipoBusca": "ent",
    "consultaExternaHelper.endereco.municipio.uf.sigla": "PR",
    "consultaExternaHelper.endereco.municipio.id": "963",
    "consultaExternaHelper.cpfcnpj": "",
    "consultaExternaHelper.nomeEntidade": "",
    "consultaExternaHelper.tipoEntidade.id": "05",
    "consultaExternaHelper.possuiCeas": "0"
}

response = requests.post(url, data=payload)
response.raise_for_status()

soup = BeautifulSoup(response.content, "html.parser")
table = soup.find("table", id="entidadeList")

df = pd.read_html(str(table))[0]

print(df)

Outputs:

   Cnpj                              Nome  Nº Identificador  UF    Município
0   NaN   CRAS FERRARIA - LINDAMIR TORRES       41042001534  PR  CAMPO LARGO
1   NaN               CRAS JARDIM MELIANE       41042003954  PR  CAMPO LARGO
2   NaN     CRAS RIVABEM - LOLA ANDREASSA       41042035547  PR  CAMPO LARGO
3   NaN  CRAS POPULAR NOVA - DURVAL WEBER       41042039531  PR  CAMPO LARGO
  • Related