Home > Back-end >  What is the optimal Python data structure and how to export it to Excel
What is the optimal Python data structure and how to export it to Excel

Time:08-27

I am trying to analyze mechanical computation results using Python and export them to Excel automatically. My raw data consists of various types of computation (each stored in a different text file). For each type, I have various load cases (loading conditions). For each load case, I output the results are various datapoints. Finally, at each datapoint, I need to store one vector result (with 3 components). So basically, the general structure is:

computation --> load --> datapoint --> component = value

After searching for similar questions, for instance:

and many others, it looks like the pandas library of Python seems the most appropriate, especially the dataframe and its ability to generate pivot_table or crosstab.

So far, I created a list of dictionary similar to this:

dd=[]
files = ["computation1.out","computation2.out"]
for files in files:
    filename=file.split(".")[0]
    with open(file,"r") as f:
        for row in f:
           ...
           #process file and extract
           #  loadnumber
           #  datapointnb
           #  component number icomp
           #  value
           # Creating a new Python dictionary
           mydict = {'Computation': filename,
                     'Load Case': RefValues[iref],
                     'Datapoint': EntValues[ient],
                     'Compo': icomp,
                     'Value': float(row)}
           # Append it to the list
           dd.append(mydict)
print(json.dumps(dd, sort_keys=True, indent=4))
df = pd.DataFrame.from_dict(dd)
print(df)
df.to_excel("myoutout.xlsx", sheet_name='Data', float_format="%.4f", merge_cells=False)

This generates something like this:

[
    {
        "Component": 1,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": 0.0004761905
    },
    {
        "Component": 2,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": -9.333414e-18
    },
    {
        "Component": 3,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": 0.0
    },
    {
        "Component": 1,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 84,
        "Value": 0.0009523809
    },
    ...
]
     Computation  LoadCase  Datapoint  Component         Value
0   computation1       1.0         57          1  4.761905e-04
1   computation1       1.0         57          2 -9.333414e-18
2   computation1       1.0         57          3  0.000000e 00
3   computation1       1.0         84          1  9.523809e-04
...

The conversion from the list of dictionaries to a dataframe and the export to Excel went well but I can't seem to create automatically pivot table or restructure the data in lines and columns:

           |computation1                                   |computation2...
           |-----------------------------------------------|---
           |loadcase1            |loadcase2            |...|loadcase1...
           |compo1 compo2 compo3 |compo1 compo2 compo3 |...|compo1...
-----------------------------------------------------------|---
datapoint1 |val1   val2   val3   |val4   val5   val6   |...|...
...        |...                                            |...
datapoint3 |val7   ...                                     |...
-----------------------------------------------------------|---

I have two questions:

  • Would this data structure (list of dictionaries) be the most efficient once the amount of data gets bigger and if not, what would you recommend? I also investigated nested dictionaries but seemed more complicated to export to Excel.
  • How can I restructure my dataframe to export to Excel using the appropriate format?

Thank you for you help, Christophe

CodePudding user response:

Finally, converting my list of dictionaries to pandas.DataFrame and using the appropriate arguments to pandas.pivot_table and exporting to Excel using pandas.pivot_table.to_excel did the trick.

Full solution was those lines:

df = pandas.DataFrame.from_dict(dd)
mypivot=pandas.pivot_table(df, index=EntTypeLabel, columns=['Computation',RefTypeLabel,'Compo'], values='Value')
print_dbg(1,mypivot)
myExcelfile="samres.xlsx"
mypivot.to_excel(myExcelfile, sheet_name='Data', float_format="%.4f", merge_cells=False)

This finally generated the desired output:

Computation   COMPUTATION1          COMPUTATION2         ...
Load Case   Load Case-1.00        Load Case-1.00         ...
Compo               Comp-1 Comp-2         Comp-1 Comp-2  ...
Node %                                                   ...
5559                  1.13  -1.08           0.80  -0.63  ...
5561                  0.77  -1.17           0.54  -0.68  ...
17789                 1.00  -1.17           0.66  -0.62  ...
17791                 0.69  -1.16           0.49  -0.60  ...
56690                 0.81   0.03           0.57   0.44  ...
56812                 1.01  -0.04           0.70   0.33  ...

Thanks to @MattDMo for the suggestion to use to_excel.

  • Related