I'm scraping data from a website where there are multiple main categories and within them there are multiple secondary categories. I got the scraping part done but I am unsure how to store the data in a proper way such that when it's converted to a DataFrame object, the data is displayed properly.
Here's a breakdown of the data that I have:
List of main categories -> List of subcategories -> List of links corresponding to that subcategory
categories = ['Cat1', 'Cat2', ...]
subcat = ['Subcat1', 'Subcat2', ...] etc
This is how the final output when the data is scraped. My question is, how can I build a dataframe so that it becomes like this in the end:
Category1 Category2
Subcat1 Link1 Subcat1 Link1
Subcat2 Link2 Subcat2 Link2
I have thought of storing the data in a list of dictionaries, and within each dictionary a list of subcategories, but it's not displaying properly.
CodePudding user response:
I think that the best way to accomplish this is to use multiple indexes. Please refer to https://pandas.pydata.org/docs/user_guide/advanced.html#hierarchical-indexing-multiindex