I have a surveys' answers from the participants in a pandas dataframe:
['A', 'B', 'C', 'A' ...],
['D', 'B', 'B', 'A' ...],
......................
['D', 'C', 'C', 'A' ...]]
and I have a vector of keys to the survey:
['D', 'B', 'B', 'A' ...]
I need to get a dataframe which displays the boolean results of survey like:
[0, 1, 0, 1 ...],
[1, 1, 1, 1 ...],
......................
[1, 0, 0, 1 ...]]
I've tried to use pd.get_dummies(users_answ, keys) but that seems wrong
CodePudding user response:
You should be able to simply check the equality between the DataFrame and the list. The list should get aligned to the DataFrame
across the columns:
df = pd.DataFrame([[*'ABCA'],[*'DBBA'],[*'DCCA']])
keys = [*'DBBA']
print(df)
0 1 2 3
0 A B C A
1 D B B A
2 D C C A
print(keys)
['D', 'B', 'B', 'A']
print(df == keys)
0 1 2 3
0 False True False True
1 True True True True
2 True False False True
# If you want actual integers instead of booleans
print((df == keys).astype(int))
0 1 2 3
0 0 1 0 1
1 1 1 1 1
2 1 0 0 1
CodePudding user response:
The easiest way seems to use pandas eq function https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eq.html#pandas.DataFrame.eq
So the whole solution oneline:
users_answ.eq(keys, axis=0)
Alternative solution:
#new array
checked_answ = []
#taking each row of surveys answers df
for r in range(0, users_answ.shape[0]):
row = users_answ.iloc[r].tolist()
#creating the array for this row
p = []
for i in range(0, len(keys)):
if(keys[i] == row[i]):
p.append(1)
else:
p.append(0)
checked_answ.append(p)