Home > database >  Read Excel file with merged cells Pandas
Read Excel file with merged cells Pandas

Time:09-29

I am trying to read an Excel file with merged cells and create a list of dictionaries from it. The DataFrame looks like:

    Name Group        Type    Values
0   AC01   A01  Individual   Value 1
1    NaN   NaN         NaN   Value 2
2    NaN   NaN         NaN   Value 3
3    NaN   NaN         NaN   Value 4
4    NaN   NaN         NaN   Value 5
5   AC02   A02  Individual   Value 6
6    NaN   NaN         NaN   Value 7
7    NaN   NaN         NaN   Value 8
8    NaN   NaN         NaN   Value 9
9    NaN   NaN         NaN  Value 10
10  AC03   A03  Individual  Value 11
11   NaN   NaN         NaN  Value 12
12   NaN   NaN         NaN  Value 13
13   NaN   NaN         NaN  Value 14
14   NaN   NaN         NaN  Value 15

And to create a list of dictionaries, I am doing:

import pandas as pd

data = pd.read_excel(io="merged_cells.xlsx", usecols="A:D")
df = (
    pd.DataFrame(data)
    .rename(
        columns={
            data.columns[0]: "name",
            data.columns[1]: "grp",
            data.columns[2]: "type",
            data.columns[3]: "values",
        }
    )
    .to_dict(orient="records")
)

Which results in:

[{'grp': 'A01', 'name': 'AC01', 'type': 'Individual', 'values': 'Value 1'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 2'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 3'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 4'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 5'},
 {'grp': 'A02', 'name': 'AC02', 'type': 'Individual', 'values': 'Value 6'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 7'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 8'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 9'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 10'},
 {'grp': 'A03', 'name': 'AC03', 'type': 'Individual', 'values': 'Value 11'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 12'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 13'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 14'},
 {'grp': nan, 'name': nan, 'type': nan, 'values': 'Value 15'}]

What I am trying to achieve is that the last column to be converted into a list in each dictioary to be like:

[
    {
        "grp": "A01",
        "name": "AC01",
        "type": "Individual",
        "values": ["Value 1", "Value 2", "Value 3", "Value 4", "Value 5"]
    }
    {
        "grp": "A02",
        "name": "AC02",
        "type": "Individual",
        "values": ["Value 6", "Value 7", "Value 8", "Value 9", "Value 10"]
    }
    {
        "grp": "A03",
        "name": "AC03",
        "type": "Individual",
        "values": ["Value 11", "Value 12", "Value 13", "Value 14", "Value 15"]
    }
]

CodePudding user response:

Use fillna to fill forward values and aggregate Values to list after create groups:

out = df.fillna().groupby(['Name', 'Group', 'Type']) \
        .agg({'Values': list}).reset_index() \
        .to_dict(orient='index').values()

Output:

>>> list(out)
[{'Name': 'AC01',
  'Group': 'A01',
  'Type': 'Individual',
  'Values': ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5']},
 {'Name': 'AC02',
  'Group': 'A02',
  'Type': 'Individual',
  'Values': ['Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10']},
 {'Name': 'AC03',
  'Group': 'A03',
  'Type': 'Individual',
  'Values': ['Value 11', 'Value 12', 'Value 13', 'Value 14', 'Value 15']}]

CodePudding user response:

IIUC, you can try:

(
    df.ffill()
    .groupby(df.columns.to_list()[:-1], as_index=False)
    .agg(list)
    .to_dict('records')
)
  • Related