Home > Net >  Getting sum of Products in an ever changing excel
Getting sum of Products in an ever changing excel

Time:10-30

So I have an Excel file witch contains way to many values, only two columns are relevant and I extract them into a new data frame. So I have two columns, first the Product number and second the Produced sum. The Problem is that the Product number doesn't align with the produced sum. The Produced sum is always at least 1 rows below the corresponding Product, however sometimes there is more than just one value and it can be that there are 10 rows of values that belong to 1 Product.

My Idea is to Index the first column for Strings(the Product Number is something like A100 000), then identify where it's located and where the next non matching string is located at. Then I have a range of rows or just a number which I can then add to a function like this:

Product1 = (df.iloc[Product1:Product1 diff1, 1])
Product1 = Product1.sum(axis=0)

to locate and sum the numbers etc. The goal is to have a script where I can inject the excel into every month and get a report how many where Produced of what.

My question is if this is even possible with python, I'm guessing it is, and how I accomplish this, I'm quite new with python and pandas. Any help is much appreciated

Sample Data:

Product Number Produced Ammount
A00 001
45
56
87
A00 005 row
5
A00 034
27
34

Output I would like:

Product Number Produced Ammount
A00 001 188
A00 005 5
A00 034 61

CodePudding user response:

I would use a groupby and sum. But as your dataframe contains blanks in the Product Number column and non numeric ones in the Produced Ammount one, I would first clean it:

tmp = df.assign(**{'Product Number': df['Product Number'].ffill(),
                   'Produced Ammount': pd.to_numeric(df['Produced Ammount'], 'coerce')})

to get:

   Product Number  Produced Ammount
0        A00 001                NaN
1        A00 001                NaN
2        A00 001               45.0
3        A00 001               56.0
4        A00 001               87.0
5        A00 005                NaN
6        A00 005                NaN
7        A00 005                5.0
8        A00 034                NaN
9        A00 034                NaN
10       A00 034               27.0
11       A00 034               34.0

It is now trivial:

tmp.groupby('Product Number').sum().astype('int').reset_index()

gives as expected:

  Product Number  Produced Ammount
0       A00 001                188
1       A00 005                  5
2       A00 034                 61
  • Related