Home > Back-end >  Pandas: New column adding values of different columns with strings and numbers
Pandas: New column adding values of different columns with strings and numbers

Time:04-18

I have a dataframe like this structure (in the real one there are more columns Game x, around 30, but for explaining I think it's ok with these 2 columns):

      Name         Game 1            Game 2
0     Player 1     Starting 68       Starting
1     Player 2     Bench 74          Starting 80
2     Player 3     Starting          Bench
3     Player 4     Bench             Bench 50
4     Player 5     NaN               Starting

I need new columns for counting the minutes of any player in the columns "Game x" based in these conditions:

  • Starting: means the player has played 90 minutes
  • Starting 68 (or whichever): means the player has played 68 minutes (or whichever)
  • Bench and NaN: means the player has played 0 minutes
  • Bench 74 (or whichever): means the player has played 16 minutes (the total is 90 so he started at the minute 74 and then is 90 - 74 = 16)

There would be 2 columns counting the number of the minutes the player has played when he started the game and when he entered the game from the bench.

The final dataframe would be:

      Name         Game 1           Game 2           Minutes Starting   Minutes Bench
0     Player 1     Starting 68      Starting         158                0
1     Player 2     Bench 74         Starting 80      80                 16
2     Player 3     Starting         Bench            90                 0
3     Player 4     Bench            Bench 50         0                  40
4     Player 5     NaN              Starting 60      60                 0  

Any suggestion about how to do it? Thanks in advance!

CodePudding user response:

If you write a function that parses a text field and returns the corresponding number of minutes, you can apply that function to each game column and add up the results. For example, the time played from start:

def played_from_start(entry):
    entry = str(entry)  # Without this, np.nan is a float.
    if entry == 'nan' or entry == '':
        return 0
    if entry.startswith('Bench'):
        return 0
    if entry == 'Starting':
        return 90
    if entry.startswith('Starting'):
        return int(entry[9:])
    print(f"Warning: Entry '{entry}' not recognized.")
    return np.nan


games = ['Game 1', 'Game 2']

df['Minutes Starting'] = np.sum(np.array([df[game].apply(played_from_start).values
                                          for game in games]),
                                axis=0)
  • Related