Home > OS >  Openpyxl or Pandas, which is better at reading data from a excel file and returning corresponding va
Openpyxl or Pandas, which is better at reading data from a excel file and returning corresponding va

Time:11-13

Hello Stack OF Community,

*
Basically my goal is to extract values from an excel file, after reading through data from another column.*

**
Thickness** of parcel, with values for example - [0.12, 0.12, 0.13, 0.14, 0.14, 0.15] (Heading: Thickness (mm))
Weight of parcel, with values for example - [4.000, 3.500, 2.500, 4.500, 5.000, 2.000] (Heading: Weight (KG))

Excel File:
Thickness Weight
0.12 4.000
0.12 3.500
0.13 2.500
0.14 4.500
0.14 5.000
0.15 2.000

Looking to generate this using Python:
Thickness Weight Parcels
0.12 7.500 2 Parcels
0.13 2.500 1 Parcels
0.14 9.500 2 Parcels
0.15 2.000 1 Parcels

TOTAL: 21.500 6 Parcels

The user will be shown all the current values of Thickness Available and will be allowed to input a single thickness value to get its weight or a range and get its weight.

So anyone of you who can recommend me how can this task be accomplished easily and efficiently.

I would be very grateful for your advice.

Please note: I have only done Python Programming Language.

Thank You.

I have learned Openpyxl but also got to know that Pandas is an efficent tool for Data Analysis, so please let me know!

Arigato!

CodePudding user response:

pandas is using openpyxl depending on the file extension under the hood in pandas.DataFrame.read_excel or pandas.DataFrame.to_excel anyways. You can probably go with pandas as you just need the one method. The performance difference (if there even is one) shouldn't affect you in any way.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

CodePudding user response:

Pandas actually uses openpyxl as well as well as some other engines inside. You can check engines field in the documentation. I think that reading and manipulations are easier with pandas, but if you need some advanced formatting, you will need to use openpyxl directly. (For basic cases pandas is enough).

Here is a basic example for your problem. You will need to change formatting for you needs.

import pandas as pd

# uncomment to read the file
# df = pd.read_excel('tmp.xlsx', index_col=None)

df = pd.DataFrame({
    "Thikness": [0.12, 0.12, 0.13, 0.14, 0.14, 0.15],
    "Weight": [4.000, 3.500, 2.500, 4.500, 5.000, 2.000, ],
})

res = df.groupby(["Thikness"], as_index=False).agg(
    Weight=('Weight', sum),
    Count=('Weight', 'count'),
)

# write excel
writer = pd.ExcelWriter('tmp.xlsx', engine='xlsxwriter')
res.to_excel(writer, sheet_name='Sheet1')
  • Related