I have an array containing made up from two lists as shown below:
([1,1,1,2,2,3,3,4,4,5,5],[29,34,70,21,56,43,89,11,90,42,87])
Eventually I want to be able to find the difference between the values with an index of 1 i.e. 29, 34 and 70. And likewise for every other index shown.
I have turned this into a dataframe but this has been reasonably pointless - I was looking to work with pandas and have found this easier before; but I now just have the option of the same data in a different format.
The dataframe I produced used this code:
df = pd.DataFrame({'Index': index, 'Value': value})
I also tried using split to separate the rows as a starting point before subtraction but this was unsuccessful as the 1 index makes up three rows not two so there are not regular intervals.
So the desired result would look something like this:
([1,1,2,3,4,5],[5,36,35,46,79,35])
in any form of dataframe, array, list etc...
Any help with this/steps towards this would be really appreciated!
CodePudding user response:
I believe this is what you want:
df = pd.DataFrame({'Index' :[1,1,1,2,2,3,3,4,4,5,5],
'Values':[29,34,70,21,56,43,89,11,90,42,87]})
df.sort_values('Index', ignore_index=True, inplace=True)
Index Values
0 1 29
1 1 34
2 1 70
3 2 21
4 2 56
5 3 43
6 3 89
7 4 11
8 4 90
9 5 42
10 5 87
Creating a new column with the differencies applied by groups of indexes:
df['Diff'] = df.groupby('Index').diff()
Output:
Index Values Diff
0 1 29 NaN
1 1 34 5.0
2 1 70 36.0
3 2 21 NaN
4 2 56 35.0
5 3 43 NaN
6 3 89 46.0
7 4 11 NaN
8 4 90 79.0
9 5 42 NaN
10 5 87 45.0
Dropping NaN
rows:
df.dropna(inplace=True)
Output:
df
Index Values Diff
1 1 34 5.0
2 1 70 36.0
4 2 56 35.0
6 3 89 46.0
8 4 90 79.0
10 5 87 45.0
CodePudding user response:
due my reputation I am not able to comment. English is not my first language, I would like to know what do you spect. Do you want to subtract all the elements from this array: [29,34,70,21,56,43,89,11,90,42,87]
or you want to do 29 - 1, 34 - 1, 70 -1, 21 - 2, I mean, each of the elements from index 1, and then subtract the corresponding from index 0. I do not if my question is clear??
After your comment, here is my code:
original_array = ([1,1,1,2,2,3,3,4,4,5,5],[29,34,70,21,56,43,89,11,90,42,87])
index_to_work = 1
final_dict = {}
for pos in range(1, len(original_array[index_to_work])):
final_dict[pos] = original_array[index_to_work][pos] - original_array[index_to_work][pos-1]
print(final_dict)
CodePudding user response:
Starting with:
data = ([1,1,1,2,2,3,3,4,4,5,5],[29,34,70,21,56,43,89,11,90,42,87])
We can create a pandas.Series
with the appropriate values and index, then group by the index, apply the .diff
and then drop missing values:
s = pd.Series(data[1], index=data[0]).groupby(level=0).diff().dropna()
This gives us:
1 5.0
1 36.0
2 35.0
3 46.0
4 79.0
5 45.0
dtype: float64
Then create a 2-tuple of the index and values converted to lists:
out = (s.index.to_list(), s.to_list())
And you end up with:
([1, 1, 2, 3, 4, 5], [5.0, 36.0, 35.0, 46.0, 79.0, 45.0])