I'm working with a dataset in Python. I've loaded it into a dataframe so that I can perform a linear regression on it. But first I need to clean the dataframe so that it only has number values. One of the columns has movies' runtime in it, phrased like this:
**Runtime**
142 min
175 min
152 min
202 min
96 min
...
And so on. How do I remove the 'min' part of the column so that the column only shows the number part? i.e.,
**Runtime**
142
175
152
202
96
...
CodePudding user response:
If need numeric before min
use Series.str.extract
:
df['Runtime'] = df['Runtime'].str.extract('(\d )\s*min', expand=False).astype(int)
Or convert values to timedeltas by to_timedelta
and convert to minutes from seconds by Series.dt.total_seconds
and divide 60:
df['Runtime'] = pd.to_timedelta(df['Runtime']).dt.total_seconds().div(60).astype(int)
print (df)
Runtime
0 142
1 175
2 152
3 202
4 96