Count number of symptoms from dataset-CodePudding

For the dataset I am working with I am trying to create a new column called NumberSymptoms which is the number of symptoms someone has. To do this I am trying to go through the columns in each row and if it is a yes for a specific one, add that to the count, and then eventually it will come to a total number.

So it should eventually be something like

Cough	Myalgia	Headache	SoreThroat	Fatigue	NumberSymptoms
Yes	Yes	No	Yes	No	3
No	Yes	Yes	Yes	Yes	4

And so on for the rest of the rows.

I have tried to make a function for this:

number = 0
def count_symptoms(Cough, Myalgia, Headache, SoreThroat, Fatigue):
    if Cough == "Yes":
        number  =1
    elif Myalgia == "Yes":
        number  =1 
    elif Headache == "Yes":
        number  =1 
    elif SoreThroat == "Yes":
        number  =1
    elif Fatigue == "Yes":
        number  =1
    return number
    
df["NumberSymptoms"] = count_symptoms(df["Cough"], df["Myalgia"], df["Headache"], df["SoreThroat"], df["Fatigue"])

But I am getting the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). And I'm not sure why and I'm sure there must be a better way to do this, I'm just not sure what it is

CodePudding user response：

Check if this fits your problem.

df["NumberSymptoms"] = df.apply(lambda row: sum(row=='Yes') ,axis=1)

CodePudding user response：

I went about it a different way and managed to get what I needed:

number = 0
symptomList = [df.Cough, df.Myalgia, df.Headache, df.SoreThroat, df.Fatigue]
for symptom in symptomList:
    number = symptom.str.count("Yes")
    number = symptom.str.count("Dry")
    number = symptom.str.count("Other")


df["NumberSymptoms"] = number

I hadn't read the specification properly before and for 1 of the columns I needed to look for either Dry or Other.

But getting the columns into a list and then going through each one and looking for the words seems to work for me