I have the following list or numpy array
ll=[7.2,0,0,0,0,0,6.5,0,0,-8.1,0,0,0,0]
and an additional list indicating the positions of non-zeros
i=[0,6,9]
I would like to make two new lists out of them, one filling the zeros and one counting in between, for this short example:
a=[7.2,7.2,7.2,7.2,7.2,7.2,6.5,6.5,6.5,-8.1,-8.1,-8.1,-8.1,-8.1]
b=[0,1,2,3,4,5,0,1,2,0,1,2,3,4]
Is therea a way to do that without a for loop to speed up things, as the list ll is quite long in my case.
CodePudding user response:
So...
import numpy as np
ll = np.array([7.2, 0, 0, 0, 0, 0, 6.5, 0, 0, -8.1, 0, 0, 0, 0])
i = np.array([0, 6, 9])
counts = np.append(
np.diff(i), # difference between each element in i
# (i element shorter than i)
len(ll) - i[-1], # length of last repeat
)
repeated = np.repeat(ll[i], counts)
repeated
becomes
[ 7.2 7.2 7.2 7.2 7.2 7.2 6.5 6.5 6.5 -8.1 -8.1 -8.1 -8.1 -8.1]
b
could be computed with
b = np.concatenate([np.arange(c) for c in counts])
print(b)
# [0 1 2 3 4 5 0 1 2 0 1 2 3 4]
but that involves a loop in the form of that list comprehension; perhaps someone Numpyier could implement it without a Python loop.
CodePudding user response:
Array a
is the result of a forward fill and array b
are indices associated with the range between each consecutive non-zero element.
pandas has a forward fill function, but it should be easy enough to compute with numpy and there are many sources on how to do this.
ll=[7.2,0,0,0,0,0,6.5,0,0,-8.1,0,0,0,0]
a = np.array(ll)
# find zero elements and associated index
mask = a == 0
idx = np.where(~mask, np.arange(mask.size), False)
# do the fill
a[np.maximum.accumulate(idx)]
output:
array([ 7.2, 7.2, 7.2, 7.2, 7.2, 7.2, 6.5, 6.5, 6.5, -8.1, -8.1,
-8.1, -8.1, -8.1])
More information about forward fill is found here:
- Most efficient way to forward-fill NaN values in numpy array
- Finding the consecutive zeros in a numpy array
Computing array b
you could use the forward fill mask and combine it with a single np.arange
:
fill_mask = np.maximum.accumulate(idx)
np.arange(len(fill_mask)) - fill_mask
output:
array([0, 1, 2, 3, 4, 5, 0, 1, 2, 0, 1, 2, 3, 4])