Home > Enterprise >  Extract regex with special character
Extract regex with special character

Time:10-19

I want to create a column in a pandas dataframe df based on another column ID. For ID that contains the the string SAT, I would like to extract the floats joined by the special character "-" and put the extracted in a new column named new_col. If ID does not contain the SAT string, leave it as NaN.

df is as below:

    Date        ID                   Time
0   2007-01-10  SAT 1 HHSP           900
1   2007-01-10  DOUBLE 7 HHSP        900
2   2007-01-10  SAT GF-06-5CSBG.431  1000
3   2007-01-10  MA HYDRO HHSP        900
4   2007-01-10  2.233 HHSP           900
5   2007-01-10  SAT L2-15-3CSB1.252  1000
6   2007-01-10  SECTION 6 HHSP       900

Expected output:

    Date        ID                   Time     new_col
0   2007-01-10  SAT 1 HHSP           900      NaN
1   2007-01-10  DOUBLE 7 HHSP        900      NaN
2   2007-01-10  SAT GF-06-5CSBG.431  1000     06-5
3   2007-01-10  MA HYDRO HHSP        900      NaN
4   2007-01-10  2.233 HHSP           900      NaN
5   2007-01-10  SAT L2-15-3 CSB1.252  1000    15-3  * In this case 15-3 instead of 2-15 is extracted because L2 is not completely floats.
6   2007-01-10  SECTION 6 HHSP       900      NaN

CodePudding user response:

Use Series.str.extract with numbers joined by - with - before it and only for values with SAT filtered by Series.str.contains:

m = df['ID'].str.contains('SAT')
df['new_col'] = df.loc[m, 'ID'].str.extract('[-\s ](\d \-\d )')
print (df)
         Date                   ID  Time new_col
0  2007-01-10           SAT 1 HHSP   900     NaN
1  2007-01-10        DOUBLE 7 HHSP   900     NaN
2  2007-01-10  SAT GF-06-5CSBG.431  1000    06-5
3  2007-01-10        MA HYDRO HHSP   900     NaN
4  2007-01-10           2.233 HHSP   900     NaN
5  2007-01-10  SAT L2-15-3CSB1.252  1000    15-3
6  2007-01-10       SECTION 6 HHSP   900     NaN

If value SAT start in column is possible use:

df['new_col'] = df['ID'].str.extract('^SAT.*[-\s ](\d \-\d )', expand=False)
  • Related