Panda series from an "inverted" list of lists-CodePudding

There is a list of lists idx_of_vals = [[ 3, 7, 10, 12, 9], [8, 0, 5, 1], [ 6, 4, 11, 2]] (say, 13 randomly permuted integers from 0 to 12).

The desired output is a series s:

>>> s
0     1
1     1
2     2
3     0
4     2
5     1
6     2
7     0
8     1
9     0
10    0
11    2
12    0
Name: my_name, dtype: int64

I.e. the elements of s with indices from the 0-th element ([ 3, 7, 10, 12, 9]) of idx_of_vals have values 0 (i.e. its index in idx_of_vals), with indices from the 1-st element of idx_of_vals have values 1, and so on.

Current solution:

s = pd.Series(np.nan, index=np.arange(13), name='my_name')
for val, idx in dict(enumerate(idx_of_vals)).items():
    s.loc[idx] = val
s = s.astype(int)

Question: Is there a more efficient and pythonic way to reach the desired result avoiding for loop?

CodePudding user response：

Swing through pandas dataframes:

(pd.DataFrame(idx_of_vals)
   .stack()
   .droplevel(level=1)
   .sort_values()
   .index)

Output:

Int64Index([1, 1, 2, 0, 2, 1, 2, 0, 1, 0, 0, 2, 0], dtype='int64')

CodePudding user response：

You can try dict comprehension

s = pd.Series(np.nan, index=np.arange(13), name='my_name')
s.update({val:idx for idx, vals in enumerate(idx_of_vals) for val in vals})

print(s)

0     1.0
1     1.0
2     2.0
3     0.0
4     2.0
5     1.0
6     2.0
7     0.0
8     1.0
9     0.0
10    0.0
11    2.0
12    0.0
Name: my_name, dtype: float64

CodePudding user response：

I would create a Series, explode and swap the index and values.

idx_of_vals = [[ 3, 7, 10, 12, 9], [8, 0, 5, 1], [ 6, 4, 11, 2]]

s = pd.Series(idx_of_vals).explode()
s = pd.Series(s.index, index=s).sort_index()

output:

0     1
1     1
2     2
3     0
4     2
5     1
6     2
7     0
8     1
9     0
10    0
11    2
12    0
dtype: int64

As a one-liner (python ≥3.8):

pd.Series((s:=pd.Series(idx_of_vals).explode()).index, index=s).sort_index()

CodePudding user response：

The for loop is not necessarily bad. Your current solution is faster than the currently accepted answer.

One thing that would make it more efficient and pythonic is not to preallocate the Series and then fill it, but to restructure the data and only then create the Series with it. For that, you can use a dictionary comprehension.

idx_of_vals = [[ 3, 7, 10, 12, 9], [8, 0, 5, 1], [ 6, 4, 11, 2]]
data = {val: idx for idx, lst in enumerate(idx_of_vals) for val in lst}
s = pd.Series(data, name='my_name').sort_index()

Output:

>>> s

0     1
1     1
2     2
3     0
4     2
5     1
6     2
7     0
8     1
9     0
10    0
11    2
12    0
Name: my_name, dtype: int64