So I have an Excel file witch contains way to many values, only two columns are relevant and I extract them into a new data frame. So I have two columns, first the Product number and second the Produced sum. The Problem is that the Product number doesn't align with the produced sum. The Produced sum is always at least 1 rows below the corresponding Product, however sometimes there is more than just one value and it can be that there are 10 rows of values that belong to 1 Product.
My Idea is to Index the first column for Strings(the Product Number is something like A100 000), then identify where it's located and where the next non matching string is located at. Then I have a range of rows or just a number which I can then add to a function like this:
Product1 = (df.iloc[Product1:Product1 diff1, 1])
Product1 = Product1.sum(axis=0)
to locate and sum the numbers etc. The goal is to have a script where I can inject the excel into every month and get a report how many where Produced of what.
My question is if this is even possible with python, I'm guessing it is, and how I accomplish this, I'm quite new with python and pandas. Any help is much appreciated
Sample Data:
Product Number | Produced Ammount |
---|---|
A00 001 | |
45 | |
56 | |
87 | |
A00 005 | row |
5 | |
A00 034 | |
27 | |
34 |
Output I would like:
Product Number | Produced Ammount |
---|---|
A00 001 | 188 |
A00 005 | 5 |
A00 034 | 61 |
CodePudding user response:
I would use a groupby
and sum
. But as your dataframe contains blanks in the Product Number
column and non numeric ones in the Produced Ammount
one, I would first clean it:
tmp = df.assign(**{'Product Number': df['Product Number'].ffill(),
'Produced Ammount': pd.to_numeric(df['Produced Ammount'], 'coerce')})
to get:
Product Number Produced Ammount
0 A00 001 NaN
1 A00 001 NaN
2 A00 001 45.0
3 A00 001 56.0
4 A00 001 87.0
5 A00 005 NaN
6 A00 005 NaN
7 A00 005 5.0
8 A00 034 NaN
9 A00 034 NaN
10 A00 034 27.0
11 A00 034 34.0
It is now trivial:
tmp.groupby('Product Number').sum().astype('int').reset_index()
gives as expected:
Product Number Produced Ammount
0 A00 001 188
1 A00 005 5
2 A00 034 61