I've been doing webscraping in Python and hit a problem which broke my script. I usually separate a certain column with str.split()
on -
, which gives me the columns I want and normally just populates everything I don't need with NA (which is fine).
Today I hit an edge case where a player with a hyphen popped up which made this not work. Below is a reproducible example, the data usually has 500 rows so this instance could occur multiple times.
import pandas as pd
df = pd.DataFrame({"score": ["Jump ball: Shai-Gilgeous Alexander vs Jeremiah Robinson-Earl", "0-0"]})
# this doesnt work anymore because of 2 players with a hyphen popped up,
# which makes this return more than the 2 columns i want
df[["scoreAway", "scoreHome"]] = df["score"].str.split(
"-", expand=True
)
error: ValueError: Columns must be same length as key
The solution to me is I need to replace hyphens with spaces, but only on rows that have that score = str.contains("Jump ball:"). So Shai-Gilgeous Alexander
would become Shai Gilgeous Alexander
, and the 0-0 would remain unaffected. But I'm having a hard time finding resources on how to do that.
If anyone has a quick fix or suggestion I'd appreciate it!
CodePudding user response:
Try adding n=1
to the .str.split()
call:
df[["scoreAway", "scoreHome"]] = df["score"].str.split(
"-", expand=True, n=1
)
That will cause it to split on the first -
only.