Home > Enterprise >  UFC record split into multiple columns in pandas
UFC record split into multiple columns in pandas

Time:12-05

I'm trying to split a UFC record column into multiple columns and am having trouble. The data looks like this

    record
1   2281
2   1871
3   124
4   82 (1 NC)
5   2391
6   2312
7   1941
8   1851 (1 NC)

The first number is wins, the second losses. If there is a third it is the draws, and if there is a parenthesis and a number it is the "no contests". I want to split it up and have it look like this.

    wins   loses  draws   no_contests
1   22     8      1       NaN
2   18     7      1       NaN
3   12     4      NaN     NaN
4   8      2      NaN     1
5   23     9      1       NaN
6   23     12     NaN     NaN
7   19     4      1       NaN
8   18     5      1       1

I tried using .str.split("-") which just made things more complicated for me. Then I tried making a for loop with a bunch of if statements to try and filter out some of the ore complicated records but failed miserably at that. Does anyone have any ideas as to what I could do? Thanks so much!

# So you can copy and paste the data in
import pandas as pd
data = {'record': ['22–8–1', '18–7–1', '12–4', '8–2 (1 NC)', '23–9–1', '23–12', '19–4–1', '18–5–1 (1 NC)']}
df = pd.DataFrame(data)

CodePudding user response:

This is a job for pandas.Series.str.extract():

# Fix em-dashes
df['record'] = df['record'].str.replace('–', '-')

new_df = df['record'].str.extract(r'^(?P<wins>\d )-(?P<loses>\d )(?:-(?P<draws>\d ))?\s*(?:\((?P<no_contests>\d ) NC\))?')

Output:

>>> new_df
  wins loses draws no_contests
0   22     8     1         NaN
1   18     7     1         NaN
2   12     4   NaN         NaN
3    8     2   NaN           1
4   23     9     1         NaN
5   23    12   NaN         NaN
6   19     4     1         NaN
7   18     5     1           1
  • Related