For the dataset I am working with I am trying to create a new column called NumberSymptoms which is the number of symptoms someone has. To do this I am trying to go through the columns in each row and if it is a yes for a specific one, add that to the count, and then eventually it will come to a total number.
So it should eventually be something like
Cough | Myalgia | Headache | SoreThroat | Fatigue | NumberSymptoms |
---|---|---|---|---|---|
Yes | Yes | No | Yes | No | 3 |
No | Yes | Yes | Yes | Yes | 4 |
And so on for the rest of the rows.
I have tried to make a function for this:
number = 0
def count_symptoms(Cough, Myalgia, Headache, SoreThroat, Fatigue):
if Cough == "Yes":
number =1
elif Myalgia == "Yes":
number =1
elif Headache == "Yes":
number =1
elif SoreThroat == "Yes":
number =1
elif Fatigue == "Yes":
number =1
return number
df["NumberSymptoms"] = count_symptoms(df["Cough"], df["Myalgia"], df["Headache"], df["SoreThroat"], df["Fatigue"])
But I am getting the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). And I'm not sure why and I'm sure there must be a better way to do this, I'm just not sure what it is
CodePudding user response:
Check if this fits your problem.
df["NumberSymptoms"] = df.apply(lambda row: sum(row=='Yes') ,axis=1)
CodePudding user response:
I went about it a different way and managed to get what I needed:
number = 0
symptomList = [df.Cough, df.Myalgia, df.Headache, df.SoreThroat, df.Fatigue]
for symptom in symptomList:
number = symptom.str.count("Yes")
number = symptom.str.count("Dry")
number = symptom.str.count("Other")
df["NumberSymptoms"] = number
I hadn't read the specification properly before and for 1 of the columns I needed to look for either Dry or Other.
But getting the columns into a list and then going through each one and looking for the words seems to work for me