Home > Net >  Computing the mean of the numbers following a number, for each different number in a list. (yes, dif
Computing the mean of the numbers following a number, for each different number in a list. (yes, dif

Time:05-19

I have the following list of numbers, which are random:

numbers = [1, 3, 5, 5, 2, 4, 1, 5, 4, 5, 2, 2]

For each number (1, 2, 3, 4, 5) I want to know the mean of the numbers that follow it.

Here is an example:
1 appears two times, at positions 0 and 6 in the list.
At position 0, it is immediately followed by the number 3 (at position 1) and at position 6 it is followed by the number 5 (at position 7).
So 1 appears two times and is immediately followed by 3 and 5.
The mean of 3 and 5 is 4, (3 5)/2 = 4.0
So the result for 1 is 4.

Using the same method for 2:
2 is found at positions 4, 10 and 11 and followed by 4 and 2. The final 2 at the end of the list is discarded as it is folloewd by nothing.
So the result for 2 is (4 2)/2 = 3.0

If I go on with this method and present the results as a dictionary I obtain this.

results = {
  1: 4.0,
  2: 3.0,
  3: 5.0,   # 5/1
  4: 3.0,   # (1 5)/2
  5: 3.25,  # (5 2 4 2)/4
}

I need to automate this procedure in an efficient way because it is supposed to run on very long lists.
I want to solve this using pandas or numpy but I am a total beginner with these packages.
I am of course reading the documentations but they are so long that I feel like I will find a solution in two years :D
Any help, shortcut or link to the right parts of the docs would be appreciated.

The results don't have to be a dictionary. It can be anything, like for instance a new dataframe, as long as the computation is efficient, and elegant if possible.

Thanks for your time !

CodePudding user response:

How about using collections.defaultdict and zip (or itertools.pairwise for python 3.10 ):

from collections import defaultdict

numbers = [1, 3, 5, 5, 2, 4, 1, 5, 4, 5, 2, 2]

dct = defaultdict(list)
for x, y in zip(numbers, numbers[1:]):
# (Alternatively, on python 3.10 ) for x, y in itertools.pairwise(numbers):
    dct[x].append(y)

dct = {k: sum(lst) / len(lst) for k, lst in dct.items()}
print(dct)
# {1: 4.0, 3: 5.0, 5: 3.25, 2: 3.0, 4: 3.0}

CodePudding user response:

Pandas Approach

s = pd.Series(numbers)
s.shift(-1).groupby(s).mean().to_dict()

{1: 4.0, 2: 3.0, 3: 5.0, 4: 3.0, 5: 3.25}

CodePudding user response:

You could create a dataframe using the numbers list zipped with itself offset by 1, then use groupby to generate means for each number:

numbers = [1, 3, 5, 5, 2, 4, 1, 5, 4, 5, 2, 2]
df = pd.DataFrame(zip(numbers, numbers[1:]), columns=['num', 'next'])
df.groupby('num').mean().reset_index().rename(columns={'next':'mean'})

Output

   num  mean
0    1  4.00
1    2  3.00
2    3  5.00
3    4  3.00
4    5  3.25
  • Related