I want to supply a scoring function for a code-breaking exercise. In the exercise, a person tries to find a function that provides a 4-digit "Solution" that matches the 4-digit "Original" from some cypher. The scoring function gives partial credit - 10 points for a totally correct answer, and otherwise 1 point for each correct digit in the correct place.
The scoring function needs to look at two columns at once. I can apply it using a loop. Is there a more direct method using apply
or similar?
import pandas as pd
# DataFrame with original values and purported solutions.
df = pd.DataFrame([
['0000', '1111'],
['1111', '1122'],
['1234', '1234'],
],
columns=['Original', 'Solution'])
df
# Scoring function
def score(solution, original):
'''
10 points for correct number.
1 point for every correct digit in place.
Scores a solution.
'''
score = 0
if solution==original:
score = 10
else:
s = list(solution)
o = list(original)
for i in range(len(o)):
if s[i]==o[i]:
score =1
return score
#- I can score it on a loop. These results are correct.
df['Score'] = [score(df.loc[i, 'Solution'], df.loc[i, 'Original']) for i in df.index]
df
# Is there a more direct method? This throws an error.
df['Score'] = df.apply(lambda x: score(x.Original, x.Solution))
CodePudding user response:
Yes, use apply
:
df['score'] = df.apply(lambda r: score(r['Solution'], r['Original']), axis=1)
or a list comprehension with zip
:
df['score'] = [score(s,o) for s,o in zip(df['Original'], df['Solution'])]
output:
Original Solution score
0 0000 1111 0
1 1111 1122 2
2 1234 1234 10
CodePudding user response:
# Is there a more direct method? This throws an error. df['Score'] = df.apply(lambda x: score(x.Original, x.Solution))
It throws an error because by default DataFrame.apply
applies the function to each column. This means that in your example x
represents each column of the DataFrame, not each row. To apply the function row-wise you have pass axis=1
, i.e.
df['Score'] = df.apply(lambda x: score(x.Original, x.Solution), axis=1)
You can also redefine the score
function to accept only one argument (each DataFrame row), and index the original and solution fields inside the function.
# Scoring function
def score(row):
'''
10 points for correct number.
1 point for every correct digit in place.
Scores a solution.
'''
solution = row['Solution']
original = row['Original']
score = 0
if solution==original:
score = 10
else:
s = list(solution)
o = list(original)
for i in range(len(o)):
if s[i]==o[i]:
score =1
return score
That way you can simply do
df['Score'] = df.apply(score, axis=1)