Home > Software engineering >  Open, save and extract text PDFs from links in python dataframe
Open, save and extract text PDFs from links in python dataframe

Time:10-01

I would like to iterate through PDF links saved in python dataframe. The goal is to open the PDF links, save the PDFs and extract text from them, then save the text from each corresponding link in a new column.

Dataframe looks like this:

    URL
0   https://westafricatradehub.com/wp-content/uploads/2021/07/RFA-WATIH-1295_Senegal-RMNCAH-Activity_English-Version.pdf
1   https://westafricatradehub.com/wp-content/uploads/2021/07/RFA-WATIH-1295_Activité-RMNCAH-Sénégal_Version-Française.pdf
2   https://westafricatradehub.com/wp-content/uploads/2021/07/Attachment-2_Full-Application-Template_Senegal-RMNCAH-Activity_English-Version.docx
3   https://westafricatradehub.com/wp-content/uploads/2021/07/Pièce-Jointe-2_Modèle-de-Demande-Complet_Activité-RMNCAH-Sénégal_Version-Française.docx
4   https://westafricatradehub.com/wp-content/uploads/2021/07/Attachment-3_Trade-Hub-Performance-Indicators-Table.xlsx
5   https://westafricatradehub.com/wp-content/uploads/2021/07/Attachment-10_Project-Budget-Template-RMNCAH.xlsx
6   https://westafricatradehub.com/wp-content/uploads/2021/08/Senegal-Health-RFA-Webinar-QA.pdf
7   https://westafricatradehub.com/wp-content/uploads/2021/02/APS-WATIH-1021_Catalytic-Business-Concepts-Round-2.pdf
8   https://westafricatradehub.com/wp-content/uploads/2021/02/APS-WATIH-1021_Concepts-d           
  • Related