Home > Enterprise >  How to get One-Hot encoded matrix from a survey table and vector of answers
How to get One-Hot encoded matrix from a survey table and vector of answers

Time:10-06

I have a surveys' answers from the participants in a pandas dataframe:

 ['A', 'B', 'C', 'A' ...],
 ['D', 'B', 'B', 'A' ...],
 ......................

 ['D', 'C', 'C', 'A' ...]]

and I have a vector of keys to the survey:

['D', 'B', 'B', 'A' ...]

I need to get a dataframe which displays the boolean results of survey like:

 [0, 1, 0, 1 ...],
 [1, 1, 1, 1 ...],
 ......................

 [1, 0, 0, 1 ...]]

I've tried to use pd.get_dummies(users_answ, keys) but that seems wrong

CodePudding user response:

You should be able to simply check the equality between the DataFrame and the list. The list should get aligned to the DataFrame across the columns:

df = pd.DataFrame([[*'ABCA'],[*'DBBA'],[*'DCCA']])
keys = [*'DBBA']

print(df)
   0  1  2  3
0  A  B  C  A
1  D  B  B  A
2  D  C  C  A

print(keys)
['D', 'B', 'B', 'A']

print(df == keys)
       0      1      2     3
0  False   True  False  True
1   True   True   True  True
2   True  False  False  True

# If you want actual integers instead of booleans
print((df == keys).astype(int))
   0  1  2  3
0  0  1  0  1
1  1  1  1  1
2  1  0  0  1

CodePudding user response:

The easiest way seems to use pandas eq function https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eq.html#pandas.DataFrame.eq

So the whole solution oneline:

users_answ.eq(keys, axis=0)

Alternative solution:

#new array
checked_answ = []
#taking each row of surveys answers df
for r in range(0, users_answ.shape[0]): 
    row = users_answ.iloc[r].tolist()
    #creating the array for this row
    p = []
    for i in range(0, len(keys)):
        if(keys[i] == row[i]):
            p.append(1)
        else:
            p.append(0)
    checked_answ.append(p)
  • Related