Home > Back-end >  Convert a series of paired values to (0,1) matrix using pandas dataframe
Convert a series of paired values to (0,1) matrix using pandas dataframe

Time:11-14

I have 2 Series of numbers: A = (24,25,26,27,28,29) and B = (105,106,107,108,109). And a DataFrame with two columns A, and B such as:

import numpy as np
import pandas as pd

A = pd.Series(np.array([24, 25, 26, 27, 28, 29]))
B = pd.Series(np.array([105, 106, 107, 108, 109]))

AB_dataframe = pd.DataFrame({
    'A': [25, 25, 25, 26, 26, 27, 27, 28, 29],
    'B': [106, 108, 109, 108, 109, 106, 108, 108, 107]
})

AB_dataframe:

    A    B
0  25  106
1  25  108
2  25  109
3  26  108
4  26  109
5  27  106
6  27  108
7  28  108
8  29  107

I want to rewrite these into a DataFrame like this:

    105  106  107  108  109
24    0    0    0    0    0
25    0    1    0    1    1
26    0    0    0    1    1
27    0    1    0    1    0
28    0    0    0    1    0
29    0    0    1    0    0

How do I do this? I've tried many different types of loops and still haven't quite got it.

CodePudding user response:

You can use crosstab to get the dummies, then reindex to get the correct index/columns:

out = (pd.crosstab(AB_dataframe['A'], AB_dataframe['B'])
         .reindex(index=A, columns=B, fill_value=0)
      )

Output:

    105  106  107  108  109
24    0    0    0    0    0
25    0    1    0    1    1
26    0    0    0    1    1
27    0    1    0    1    0
28    0    0    0    1    0
29    0    0    1    0    0

CodePudding user response:

You can do the following:

A = (24, 25, 26, 27, 28, 29)
B = (105, 106, 107, 108, 109)

df = pd.DataFrame(
    {
        "A": [25, 25, 25, 26, 26, 27, 27, 28, 29],
        "B": [106, 108, 109, 108, 109, 106, 108, 108, 107],
    }
)

out = pd.DataFrame(
    np.array(
        [1 if (x, y) in set(zip(df["A"], df["B"])) else 0 for x in A for y in B]
    ).reshape((len(A), len(B))),
    index=A,
    columns=B,
)

## Or equivalently, using dtype uint8 to cast booleans to 0 and 1s:

out = pd.DataFrame(
    np.array(
        [(x, y) in set(zip(df["A"], df["B"])) for x in A for y in B],
        dtype="uint8"
    ).reshape((len(A), len(B))),
    index=A,
    columns=B,
)

CodePudding user response:

Another option, with value_counts:

(AB_dataframe.value_counts()
             .unstack(fill_value=0)
             .reindex(columns = B, fill_value = 0)
)
    105  106  107  108  109
A                          
25    0    1    0    1    1
26    0    0    0    1    1
27    0    1    0    1    0
28    0    0    0    1    0
29    0    0    1    0    0

  • Related