Can Pandas apply a user-specified function with multiple arguments?-CodePudding

I want to supply a scoring function for a code-breaking exercise. In the exercise, a person tries to find a function that provides a 4-digit "Solution" that matches the 4-digit "Original" from some cypher. The scoring function gives partial credit - 10 points for a totally correct answer, and otherwise 1 point for each correct digit in the correct place.

The scoring function needs to look at two columns at once. I can apply it using a loop. Is there a more direct method using apply or similar?

import pandas as pd

# DataFrame with original values and purported solutions.  
df = pd.DataFrame([
    ['0000', '1111'],
    ['1111', '1122'],
    ['1234', '1234'],
    ],
    columns=['Original', 'Solution'])
df

# Scoring function
def score(solution, original):
    '''
    10 points for correct number.
    1 point for every correct digit in place.
    Scores a solution.
    '''
    score = 0
    if solution==original:
        score = 10
    else:
        s = list(solution)
        o = list(original)
        for i in range(len(o)):
            if s[i]==o[i]:
                score =1 
    return score

#- I can score it on a loop.  These results are correct.
df['Score'] = [score(df.loc[i, 'Solution'], df.loc[i, 'Original']) for i in df.index]
df

# Is there a more direct method?  This throws an error.
df['Score'] = df.apply(lambda x: score(x.Original, x.Solution))

CodePudding user response：

Yes, use apply:

df['score'] = df.apply(lambda r: score(r['Solution'], r['Original']), axis=1)

or a list comprehension with zip:

df['score'] = [score(s,o) for s,o in zip(df['Original'], df['Solution'])]

output:

  Original Solution  score
0     0000     1111      0
1     1111     1122      2
2     1234     1234     10

CodePudding user response：

# Is there a more direct method?  This throws an error.
df['Score'] = df.apply(lambda x: score(x.Original, x.Solution))

It throws an error because by default DataFrame.apply applies the function to each column. This means that in your example x represents each column of the DataFrame, not each row. To apply the function row-wise you have pass axis=1, i.e.

df['Score'] = df.apply(lambda x: score(x.Original, x.Solution), axis=1)

You can also redefine the score function to accept only one argument (each DataFrame row), and index the original and solution fields inside the function.

# Scoring function
def score(row):
    '''
    10 points for correct number.
    1 point for every correct digit in place.
    Scores a solution.
    '''
    solution = row['Solution']
    original = row['Original']
    score = 0
    
    if solution==original:
        score = 10
    else:
        s = list(solution)
        o = list(original)
        for i in range(len(o)):
            if s[i]==o[i]:
                score =1 
    return score

That way you can simply do

df['Score'] = df.apply(score, axis=1)