Group rows and subtract in python-CodePudding

I have an array containing made up from two lists as shown below:

([1,1,1,2,2,3,3,4,4,5,5],[29,34,70,21,56,43,89,11,90,42,87])

Eventually I want to be able to find the difference between the values with an index of 1 i.e. 29, 34 and 70. And likewise for every other index shown.

I have turned this into a dataframe but this has been reasonably pointless - I was looking to work with pandas and have found this easier before; but I now just have the option of the same data in a different format.

The dataframe I produced used this code:

df = pd.DataFrame({'Index': index, 'Value': value})

I also tried using split to separate the rows as a starting point before subtraction but this was unsuccessful as the 1 index makes up three rows not two so there are not regular intervals.

So the desired result would look something like this:

([1,1,2,3,4,5],[5,36,35,46,79,35])

in any form of dataframe, array, list etc...

Any help with this/steps towards this would be really appreciated!

CodePudding user response：

I believe this is what you want:

df = pd.DataFrame({'Index' :[1,1,1,2,2,3,3,4,4,5,5],
                   'Values':[29,34,70,21,56,43,89,11,90,42,87]})
df.sort_values('Index', ignore_index=True, inplace=True)

  Index Values
0     1     29
1     1     34
2     1     70
3     2     21
4     2     56
5     3     43
6     3     89
7     4     11
8     4     90
9     5     42
10    5     87

Creating a new column with the differencies applied by groups of indexes:

df['Diff'] = df.groupby('Index').diff()

Output:

  Index Values  Diff
0     1     29   NaN
1     1     34   5.0
2     1     70  36.0
3     2     21   NaN
4     2     56  35.0
5     3     43   NaN
6     3     89  46.0
7     4     11   NaN
8     4     90  79.0
9     5     42   NaN
10    5     87  45.0

Dropping NaN rows:

df.dropna(inplace=True)

Output:

df

  Index Values   Diff
1     1     34    5.0
2     1     70   36.0
4     2     56   35.0
6     3     89   46.0
8     4     90   79.0
10    5     87   45.0

CodePudding user response：

due my reputation I am not able to comment. English is not my first language, I would like to know what do you spect. Do you want to subtract all the elements from this array: [29,34,70,21,56,43,89,11,90,42,87]

or you want to do 29 - 1, 34 - 1, 70 -1, 21 - 2, I mean, each of the elements from index 1, and then subtract the corresponding from index 0. I do not if my question is clear??

After your comment, here is my code:

original_array = ([1,1,1,2,2,3,3,4,4,5,5],[29,34,70,21,56,43,89,11,90,42,87])
index_to_work = 1
final_dict = {}
for pos in range(1, len(original_array[index_to_work])):
    final_dict[pos] = original_array[index_to_work][pos] - original_array[index_to_work][pos-1] 
print(final_dict)

CodePudding user response：

Starting with:

data = ([1,1,1,2,2,3,3,4,4,5,5],[29,34,70,21,56,43,89,11,90,42,87])

We can create a pandas.Series with the appropriate values and index, then group by the index, apply the .diff and then drop missing values:

s = pd.Series(data[1], index=data[0]).groupby(level=0).diff().dropna()

This gives us:

1     5.0
1    36.0
2    35.0
3    46.0
4    79.0
5    45.0
dtype: float64

Then create a 2-tuple of the index and values converted to lists:

out = (s.index.to_list(), s.to_list())

And you end up with:

([1, 1, 2, 3, 4, 5], [5.0, 36.0, 35.0, 46.0, 79.0, 45.0])