What is the optimal Python data structure and how to export it to Excel-CodePudding

I am trying to analyze mechanical computation results using Python and export them to Excel automatically. My raw data consists of various types of computation (each stored in a different text file). For each type, I have various load cases (loading conditions). For each load case, I output the results are various datapoints. Finally, at each datapoint, I need to store one vector result (with 3 components). So basically, the general structure is:

computation --> load --> datapoint --> component = value

After searching for similar questions, for instance:

and many others, it looks like the pandas library of Python seems the most appropriate, especially the dataframe and its ability to generate pivot_table or crosstab.

So far, I created a list of dictionary similar to this:

dd=[]
files = ["computation1.out","computation2.out"]
for files in files:
    filename=file.split(".")[0]
    with open(file,"r") as f:
        for row in f:
           ...
           #process file and extract
           #  loadnumber
           #  datapointnb
           #  component number icomp
           #  value
           # Creating a new Python dictionary
           mydict = {'Computation': filename,
                     'Load Case': RefValues[iref],
                     'Datapoint': EntValues[ient],
                     'Compo': icomp,
                     'Value': float(row)}
           # Append it to the list
           dd.append(mydict)
print(json.dumps(dd, sort_keys=True, indent=4))
df = pd.DataFrame.from_dict(dd)
print(df)
df.to_excel("myoutout.xlsx", sheet_name='Data', float_format="%.4f", merge_cells=False)

This generates something like this:

[
    {
        "Component": 1,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": 0.0004761905
    },
    {
        "Component": 2,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": -9.333414e-18
    },
    {
        "Component": 3,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 57,
        "Value": 0.0
    },
    {
        "Component": 1,
        "Computation": "computation1",
        "LoadCase": 1.0,
        "Datapoint": 84,
        "Value": 0.0009523809
    },
    ...
]
     Computation  LoadCase  Datapoint  Component         Value
0   computation1       1.0         57          1  4.761905e-04
1   computation1       1.0         57          2 -9.333414e-18
2   computation1       1.0         57          3  0.000000e 00
3   computation1       1.0         84          1  9.523809e-04
...

The conversion from the list of dictionaries to a dataframe and the export to Excel went well but I can't seem to create automatically pivot table or restructure the data in lines and columns:

           |computation1                                   |computation2...
           |-----------------------------------------------|---
           |loadcase1            |loadcase2            |...|loadcase1...
           |compo1 compo2 compo3 |compo1 compo2 compo3 |...|compo1...
-----------------------------------------------------------|---
datapoint1 |val1   val2   val3   |val4   val5   val6   |...|...
...        |...                                            |...
datapoint3 |val7   ...                                     |...
-----------------------------------------------------------|---

I have two questions:

Would this data structure (list of dictionaries) be the most efficient once the amount of data gets bigger and if not, what would you recommend? I also investigated nested dictionaries but seemed more complicated to export to Excel.
How can I restructure my dataframe to export to Excel using the appropriate format?

Thank you for you help, Christophe

CodePudding user response：

Finally, converting my list of dictionaries to pandas.DataFrame and using the appropriate arguments to pandas.pivot_table and exporting to Excel using pandas.pivot_table.to_excel did the trick.

Full solution was those lines:

df = pandas.DataFrame.from_dict(dd)
mypivot=pandas.pivot_table(df, index=EntTypeLabel, columns=['Computation',RefTypeLabel,'Compo'], values='Value')
print_dbg(1,mypivot)
myExcelfile="samres.xlsx"
mypivot.to_excel(myExcelfile, sheet_name='Data', float_format="%.4f", merge_cells=False)

This finally generated the desired output:

Computation   COMPUTATION1          COMPUTATION2         ...
Load Case   Load Case-1.00        Load Case-1.00         ...
Compo               Comp-1 Comp-2         Comp-1 Comp-2  ...
Node %                                                   ...
5559                  1.13  -1.08           0.80  -0.63  ...
5561                  0.77  -1.17           0.54  -0.68  ...
17789                 1.00  -1.17           0.66  -0.62  ...
17791                 0.69  -1.16           0.49  -0.60  ...
56690                 0.81   0.03           0.57   0.44  ...
56812                 1.01  -0.04           0.70   0.33  ...

Thanks to @MattDMo for the suggestion to use to_excel.