Animal | Sound | score-------- |
---|---|---|
dog | arf | 0------------- |
cat | meow | 0------------- |
I have a dataFrame like above, with many more columns. I want to use "Animal" and "Sound" columns ( many more) to calculate 'score'.
Because there are many columns, using lambda does not seem possible.
Suppose I did
def assign_score(data):
if data.Animal == "dog":
data.score = 1
if data.Animal == "cat":
data.score = 2
if data.Sound== "arf":
data.score = 10
if data.Sound == "meow":
data.score = 20
df.Score = df.apply(assign_score)
I get a KeyError on this code. Is there a way to get it work?
Thanks
CodePudding user response:
import pandas as pd
df=pd.DataFrame({'Animal':['dog','cat'], 'Sound': ['arf','meow'], 'score':[0,0]})
display(df)
# Give all of your column names in this list
colNames = ['Animal', 'Sound','score']
def assign_score(data):
#print(type(data))
if data[0] == "dog":
score=data[2] 1
if data[0] == "cat":
score=2
if data[1]== "arf":
score=data[2] 10
if data[1]== "meow":
score=data[2] 20
#print('score',score)
return score
#df.Score = df.apply(assign_score)
df['score']=df.apply(lambda x: assign_score(x[colNames[0:]]), axis=1)
display(df)
CodePudding user response:
Try this
import pandas as pd
columns = ['animal', 'sound']
data = [('dog', 'arf'),
('cat', 'meow')]
df = pd.DataFrame(data, columns=columns)
score = {'dog' : 1, 'cat' : 2,
'arf' : 10, 'meow' :20}
df['score'] = df.apply(lambda x: sum([score[x[col]] for col in columns]), axis=1)
print(df)
animal sound score
0 dog arf 11
1 cat meow 22