everyone! I tried searching and Googling for the answer but none of the examples or topics covered doesn't appear to cover this.
So I'm working on a csv I'm making. This one's pretty small and I'm OK sharing the first few lines:
Game,MafiaHost,Day,Voter,Voted,Was Voter Mafia?,Was Voted Mafia?,Did User Win?
XHF Mafia,Reck,1,Kuroi,Maverick,Y,N,Y
XHF Mafia,Reck,1,Maverick,Caffrey,N,N,N
XHF Mafia,Reck,1,Kira,Swann,Y,N,Y
XHF Mafia,Reck,1,Swann,Kira,N,Y,N
XHF Mafia,Reck,1,Bobby,Kira,N,Y,N
Minor background information: This is a Mafia party game where we all guess who the bad guys are through logic and reasoning. I wanted to create an ipynb in order to gain insight as to patterns and such.
Anyways, I wanted to create a column for accuracy:
first['Accuracy'] = 0
Here's where the issues come in.
for idx,row in first.iterrows(): #To loop through each line.
if first['Was Voter Mafia?'] == 'N': #In order to exclude those who are Mafia since Mafia generally vote non-Mafia so they can win.
if row['Was Voted Mafia?'] == 'Y': #Essentially, this should be trying to increase the accuracy for non-Mafia successfully voting someone who is Mafia.
row['Accuracy'] = 1 #I'm still trying to feel around how I'm going to get the accuracy to work, but for right now, turning the automatic 0 to a 1 would be helpful.
Here's the error received:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-33-b74e150b550c> in <module>
1 for idx,row in first.iterrows():
----> 2 if first['Was Voter Mafia?'] == 'N':
3 if row['Was Voted Mafia?'] == 'Y':
4 row['Accuracy'] = 1
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1440 @final
1441 def __nonzero__(self):
-> 1442 raise ValueError(
1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I get how these fields are binary in nature, but everything I've read that results in error have to do with filtering.
EDIT: I also swapped the loop to reflect this:
for idx,row in first.iterrows():
if row['Was Voted Mafia?'] == 'Y':
if first['Was Voter Mafia?'] == 'N':
row['Accuracy'] = 1
The same error still exists.
CodePudding user response:
You want to avoid loops when operating on DataFrames if at all possible, as generally these operations are much slower than array-based/vectorized operations.
The error you are getting is due to the fact that you are trying to determine the truth value of an entire boolean series: first['Was Voter Mafia?'] == 'N'
from io import StringIO
import pandas as pd
dat = StringIO("""
Game,MafiaHost,Day,Voter,Voted,Was Voter Mafia?,Was Voted Mafia?,Did User Win?
XHF Mafia,Reck,1,Kuroi,Maverick,Y,N,Y
XHF Mafia,Reck,1,Maverick,Caffrey,N,N,N
XHF Mafia,Reck,1,Kira,Swann,Y,N,Y
XHF Mafia,Reck,1,Swann,Kira,N,Y,N
XHF Mafia,Reck,1,Bobby,Kira,N,Y,N
""")
first = pd.read_csv(dat)
first['Was Voter Mafia?'] == 'N'
Out
0 False
1 True
2 False
3 True
4 True
Name: Was Voter Mafia?, dtype: bool
You can see that you have both trues and falses, yet you are trying to use this Series in an if:
statement. any()
or all()
would resolve this boolean series to a singular boolean which can be used to check for truth.
All that said, it appears that you are attempting to set accuracy
equal to 1 where the person/row was voted Mafia, while excluding votes that came from members of the Mafia.
You can use a boolean mask to filter your dataframe and set values only where certain conditions are met.
first['Accuracy'] = 0
condition_one = first['Was Voter Mafia?'] == 'N'
condition_two = first['Was Voted Mafia?'] == 'Y'
combined_conditions = condition_one & condition_two
# where the combined_condition array == True, set the value of Accuracy to 1
first.loc[combined_conditions, 'Accuracy'] = 1
Read about boolean indexing and the .loc
operator
CodePudding user response:
Here I have taken your data in CSV file and I have written the code. Code as follow
import pandas as pd
df = pd.read_csv('problem2.csv')
print(df)
df['Accuracy']=0
df['Accuracy'][(df['Was Voter Mafia?'] == 'N')&(df['Was Voted Mafia?']=='Y')] = 1
print(df[['Was Voter Mafia?','Was Voted Mafia?','Accuracy']])
OutPut
CodePudding user response:
I'm not at all sure what your logic is for determining accuracy (it surely should be some metric based around the voter status, the value of the vote, and the status of the person voted against?), but there are a couple of ways to build up some truth value columns.
Using the set-up provided by @JWilliams1:
from io import StringIO
import pandas as pd
dat = StringIO("""
Game,MafiaHost,Day,Voter,Voted,Was Voter Mafia?,Was Voted Mafia?,Did User Win?
XHF Mafia,Reck,1,Kuroi,Maverick,Y,N,Y
XHF Mafia,Reck,1,Maverick,Caffrey,N,N,N
XHF Mafia,Reck,1,Kira,Swann,Y,N,Y
XHF Mafia,Reck,1,Swann,Kira,N,Y,N
XHF Mafia,Reck,1,Bobby,Kira,N,Y,N
""")
df = pd.read_csv(dat)
df
you can set truth values in a single column by a simple assignment over the rows:
df["voterNotInMafia"] = df["Was Voter Mafia?"] == "N"`
which give the result for df["voterNotInMafia"]
of:
0 False
1 True
2 False
3 True
4 True
Name: voterNotInMafia, dtype: bool
You can apply a function that implements a logical test across columns within each row:
def some_measure(row):
"""I'm not convinced this is an accuracy measure?!"""
return row["voterNotInMafia"] & (row["Was Voted Mafia?"]=="Y")
# Apply the function to each row: axis=1
df["some_measure"] = df.apply(some_measure, axis=1)
Again, this would return a column of Boolean values:
cols = ["voterNotInMafia", "Was Voted Mafia?", "some_measure"]
df[cols]
voterNotInMafia | Was Voted Mafia? | some_measure |
---|---|---|
False | N | False |
True | N | False |
False | N | False |
True | Y | True |
True | Y | True |
With Boolean typed columns, you can sum them, with True
values counting as 1
and False
values counting 0
:
df["some_measure"].sum()
gives the answer 2
. As a proportion of the total number of rows, df["some_measure"].sum()/len(df)
.