Home > OS >  How to remove a string part of a column value?
How to remove a string part of a column value?

Time:11-28

I'm working with a dataset in Python. I've loaded it into a dataframe so that I can perform a linear regression on it. But first I need to clean the dataframe so that it only has number values. One of the columns has movies' runtime in it, phrased like this:

**Runtime**
142 min
175 min
152 min
202 min
96 min
...

And so on. How do I remove the 'min' part of the column so that the column only shows the number part? i.e.,

**Runtime**
142
175
152
202
96
...

CodePudding user response:

If need numeric before min use Series.str.extract:

df['Runtime'] = df['Runtime'].str.extract('(\d )\s*min', expand=False).astype(int)

Or convert values to timedeltas by to_timedelta and convert to minutes from seconds by Series.dt.total_seconds and divide 60:

df['Runtime'] = pd.to_timedelta(df['Runtime']).dt.total_seconds().div(60).astype(int)
print (df)
   Runtime
0      142
1      175
2      152
3      202
4       96
  • Related