Reading an Excel file with united cells in Python-CodePudding

I have an excel table of the following type (the problem described below is driven by the presence of the united cells).

I am using read_excel from pandas to read it.

What I want: I would like to use the values in the first column as an index, and to have the values in the third column combined in one cell, e.g. like here.

What I get from directly applying read_excel can be seen here.

If needed: please see the code used to read the file below (I am reading it from google drive in google colab):

path = '/content/drive/MyDrive/ExampleFile.xlsx'
pd.read_excel(path, header = 0, index_col = 0)

Could you please help? Please let me know if anything in the question is unclear.

CodePudding user response：

here is one way to accomplish it. I created the xls similar to yours, the first column had a heading of sno

# fill the null values with values from previous rows
df=df.ffill()

# combine the rows where class is the same and create a new column
df=df.assign(comb=df.groupby(['class'])['type'].transform(lambda x: ','.join(x)))

# drop the duplicated rows
df2=df.drop_duplicates(subset=['class','comb'])[['class','comb']]

    class   comb
0   fruit   apple,orange
2   toys    car,truck,train