I have 2 Series of numbers: A = (24,25,26,27,28,29) and B = (105,106,107,108,109). And a DataFrame with two columns A, and B such as:
import numpy as np
import pandas as pd
A = pd.Series(np.array([24, 25, 26, 27, 28, 29]))
B = pd.Series(np.array([105, 106, 107, 108, 109]))
AB_dataframe = pd.DataFrame({
'A': [25, 25, 25, 26, 26, 27, 27, 28, 29],
'B': [106, 108, 109, 108, 109, 106, 108, 108, 107]
})
AB_dataframe
:
A B
0 25 106
1 25 108
2 25 109
3 26 108
4 26 109
5 27 106
6 27 108
7 28 108
8 29 107
I want to rewrite these into a DataFrame like this:
105 106 107 108 109
24 0 0 0 0 0
25 0 1 0 1 1
26 0 0 0 1 1
27 0 1 0 1 0
28 0 0 0 1 0
29 0 0 1 0 0
How do I do this? I've tried many different types of loops and still haven't quite got it.
CodePudding user response:
You can use crosstab
to get the dummies, then reindex
to get the correct index/columns:
out = (pd.crosstab(AB_dataframe['A'], AB_dataframe['B'])
.reindex(index=A, columns=B, fill_value=0)
)
Output:
105 106 107 108 109
24 0 0 0 0 0
25 0 1 0 1 1
26 0 0 0 1 1
27 0 1 0 1 0
28 0 0 0 1 0
29 0 0 1 0 0
CodePudding user response:
You can do the following:
A = (24, 25, 26, 27, 28, 29)
B = (105, 106, 107, 108, 109)
df = pd.DataFrame(
{
"A": [25, 25, 25, 26, 26, 27, 27, 28, 29],
"B": [106, 108, 109, 108, 109, 106, 108, 108, 107],
}
)
out = pd.DataFrame(
np.array(
[1 if (x, y) in set(zip(df["A"], df["B"])) else 0 for x in A for y in B]
).reshape((len(A), len(B))),
index=A,
columns=B,
)
## Or equivalently, using dtype uint8 to cast booleans to 0 and 1s:
out = pd.DataFrame(
np.array(
[(x, y) in set(zip(df["A"], df["B"])) for x in A for y in B],
dtype="uint8"
).reshape((len(A), len(B))),
index=A,
columns=B,
)
CodePudding user response:
Another option, with value_counts:
(AB_dataframe.value_counts()
.unstack(fill_value=0)
.reindex(columns = B, fill_value = 0)
)
105 106 107 108 109
A
25 0 1 0 1 1
26 0 0 0 1 1
27 0 1 0 1 0
28 0 0 0 1 0
29 0 0 1 0 0