I want to create a column in a pandas dataframe df
based on another column ID
. For ID
that contains the the string SAT
, I would like to extract the floats joined by the special character "-" and put the extracted in a new column named new_col
. If ID
does not contain the SAT
string, leave it as NaN
.
df
is as below:
Date ID Time
0 2007-01-10 SAT 1 HHSP 900
1 2007-01-10 DOUBLE 7 HHSP 900
2 2007-01-10 SAT GF-06-5CSBG.431 1000
3 2007-01-10 MA HYDRO HHSP 900
4 2007-01-10 2.233 HHSP 900
5 2007-01-10 SAT L2-15-3CSB1.252 1000
6 2007-01-10 SECTION 6 HHSP 900
Expected output:
Date ID Time new_col
0 2007-01-10 SAT 1 HHSP 900 NaN
1 2007-01-10 DOUBLE 7 HHSP 900 NaN
2 2007-01-10 SAT GF-06-5CSBG.431 1000 06-5
3 2007-01-10 MA HYDRO HHSP 900 NaN
4 2007-01-10 2.233 HHSP 900 NaN
5 2007-01-10 SAT L2-15-3 CSB1.252 1000 15-3 * In this case 15-3 instead of 2-15 is extracted because L2 is not completely floats.
6 2007-01-10 SECTION 6 HHSP 900 NaN
CodePudding user response:
Use Series.str.extract
with numbers joined by -
with -
before it and only for values with SAT
filtered by Series.str.contains
:
m = df['ID'].str.contains('SAT')
df['new_col'] = df.loc[m, 'ID'].str.extract('[-\s ](\d \-\d )')
print (df)
Date ID Time new_col
0 2007-01-10 SAT 1 HHSP 900 NaN
1 2007-01-10 DOUBLE 7 HHSP 900 NaN
2 2007-01-10 SAT GF-06-5CSBG.431 1000 06-5
3 2007-01-10 MA HYDRO HHSP 900 NaN
4 2007-01-10 2.233 HHSP 900 NaN
5 2007-01-10 SAT L2-15-3CSB1.252 1000 15-3
6 2007-01-10 SECTION 6 HHSP 900 NaN
If value SAT
start in column is possible use:
df['new_col'] = df['ID'].str.extract('^SAT.*[-\s ](\d \-\d )', expand=False)