Automatically converting a 1-level list to a nested list-CodePudding

Here we have a link where there is a table: http://pitzavod.ru/products/upakovka/

When I read it with pd.read_html I do get a list, but it 1.) is not nested, thus when converted to a dataframe it is not readable, 2.) contains integers 0 to number of rows in the table on the website. The list I get looks like:

[                                                   0                1  \
 0                                         Показатели  Марка целлюлозы   
 1                                                ОСН              NaN   
 2  Механическая прочность при размоле в мельнице ...   10 000 740 520   
 3                       Степень делигнификации, п.е.          28 - 45   
 4  Сорность - число соринок в условной массе 500г...             6500   
 5                              Влажность, % не более               20   
 
                                        2  
 0                       Методы испытаний  
 1                                    NaN  
 2  ГОСТ13525.1 ГОСТ 13525.3 ГОСТ 13525.8  
 3                             ГОСТ 10070  
 4                           ГОСТ 14363.3  
 5                             ГОСТ 16932  ]

Is there a way to easily clean this pandas outpute, or do I properly need to parse the website? Thank you.

CodePudding user response：

That's because read_html returns always a list (even if the number of tables is 1).

pandas.read_html :Read HTML tables into a list of DataFrame objects.

You need to slice it with [0] :

df = pd.read_html("http://pitzavod.ru/products/upakovka/")[0]

Output (showing the last two columns) :

                 1                                      2
0  Марка целлюлозы                       Методы испытаний
1              ОСН                       Методы испытаний
2   10 000 740 520  ГОСТ13525.1 ГОСТ 13525.3 ГОСТ 13525.8
3          28 - 45                             ГОСТ 10070
4             6500                           ГОСТ 14363.3
5               20                             ГОСТ 16932