I have the following DF:
| Fecha | Partido | Equipo | xG | xGA |
|------------|-------------------------|---------|------|------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 |
| 2022-05-01 | Leon - América 2:0 | América | 1.70 | 0.35 |
I want to create three new columns based on the Partido column where the first team goes to a new column named Home, the second team to a column named Visitor and the score to a column named Score.
Desired Output:
| Fecha | Partido | Equipo | xG | xGA | Home | Away | Score |
|------------|-------------------------|---------|------|------|-------- |------------|-------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 | América | Cruz Azul | 0:0 |
| 2022-05-01 | Leon - América 2:0 | América | 1.70 | 0.35 | Leon | América | 2:0 |
I have tried splitting with delimiter but since some teams have two words in their names it doesn't work.
CodePudding user response:
It is quite simple using str.extract
and a regex:
regex = r'([^-] )\s*-\s*([^-] ) (\d :\d )'
df[['Home', 'Away', 'Score']] = df['Partido'].str.extract(regex)
output:
Fecha Partido Equipo xG xGA Home Away Score
0 2022-05-01 América - Cruz Azul 0:0 América 1.53 0.45 América Cruz Azul 0:0
1 2022-05-01 Leon - América 2:0 América 1.70 0.35 Leon América 2:0
If you don't want to modify the original DataFrame, you can also use named capturing groups to directly set the column names:
regex = r'(?P<Home>[^-] )\s*-\s*(?P<Away>[^-] ) (?P<Score>\d :\d )'
df2 = df['Partido'].str.extract(regex)
# Home Away Score
# 0 América Cruz Azul 0:0
# 1 Leon América 2:0
# OR
df2 = df.join(df['Partido'].str.extract(regex))
# same a first output
CodePudding user response:
You might use rsplit
to get teams and score then split
to get home and away following way
import pandas as pd
df = pd.DataFrame({'Partido':['América - Cruz Azul 0:0','Leon - América 2:0']})
df[['Teams', 'Score']] = df.Partido.str.rsplit(' ', 1, expand=True)
df[['Home', 'Away']] = df.pop('Teams').str.split(' - ', expand=True)
print(df)
output
Partido Score Home Away
0 América - Cruz Azul 0:0 0:0 América Cruz Azul
1 Leon - América 2:0 2:0 Leon América
Note: I simplified input df for brevity sake. 1 in rsplit mean split once, i.e. at rightmost space. .pop
does remove and return selected column, I use it as Teams
is only temporary.
CodePudding user response:
One option with a split (regex):
regex = r"(\d :\d )|\s -\s "
outcome = (df.Partido
.str.split(regex, expand = True)
.dropna(how ='all', axis = 1)
.iloc[:, :-1]
)
df[['Home', 'Away', 'Score']] = outcome
df
Fecha Partido Equipo xG xGA Home Away Score
0 2022-05-01 América - Cruz Azul 0:0 América 1.53 0.45 América Cruz Azul 0:0
1 2022-05-01 Leon - América 2:0 América 1.70 0.35 Leon América 2:0