Home > database >  How to only apply str.split in Python on certain Rows or conditions?
How to only apply str.split in Python on certain Rows or conditions?

Time:11-14

I've been doing webscraping in Python and hit a problem which broke my script. I usually separate a certain column with str.split() on -, which gives me the columns I want and normally just populates everything I don't need with NA (which is fine).

Today I hit an edge case where a player with a hyphen popped up which made this not work. Below is a reproducible example, the data usually has 500 rows so this instance could occur multiple times.

import pandas as pd
df = pd.DataFrame({"score": ["Jump ball: Shai-Gilgeous Alexander vs Jeremiah Robinson-Earl", "0-0"]})

# this doesnt work anymore because of 2 players with a hyphen popped up,
# which makes this return more than the 2 columns i want

df[["scoreAway", "scoreHome"]] = df["score"].str.split(
    "-", expand=True
)

error: ValueError: Columns must be same length as key

The solution to me is I need to replace hyphens with spaces, but only on rows that have that score = str.contains("Jump ball:"). So Shai-Gilgeous Alexander would become Shai Gilgeous Alexander, and the 0-0 would remain unaffected. But I'm having a hard time finding resources on how to do that.

If anyone has a quick fix or suggestion I'd appreciate it!

CodePudding user response:

Try adding n=1 to the .str.split() call:

df[["scoreAway", "scoreHome"]] = df["score"].str.split(
    "-", expand=True, n=1
)

That will cause it to split on the first - only.

  • Related