Modifying pandas row value based on its length-CodePudding

I have a column in my pandas dataframe with the following values that represent hours worked in a week.

0                             40
1                  40h / week
2      46.25h/week on average
3                             11

I would like to check every row, and if the length of the value is larger than 2 digits - extract the number of hours only from it. I have tried the following:

df['Hours_per_week'].apply(lambda x: (x.extract('(\d )') if(len(str(x)) > 2) else x))

However I am getting the AttributeError: 'str' object has no attribute 'extract' error.

CodePudding user response：

Assuming the series data are strings, try this:

df['Hours_per_week'].str.extract('(\d )')

CodePudding user response：

Why not immediately extract float pattern i.e. \d \.?\d ?

>>> s = pd.Series(['40', '40h / week', '46.25h/week on average', '11'])
>>> s.str.extract("(\d \.?\d )")
       0
0     40
1     40
2  46.25
3     11

2 digits will still match either way.

CodePudding user response：

It looks like you could ensure having h after the number:

df['Hours_per_week'].str.extract(r'(\d{2}\.?\d*)h', expand=False)

Output:

0      NaN
1       40
2    46.25
3      NaN
Name: Hours_per_week, dtype: object