Home > Software engineering >  Is there a way to convert/standardize text into Integer in Python?
Is there a way to convert/standardize text into Integer in Python?

Time:03-02

I have a dataframe with a column showing time(in minutes) spent for organizing each inventory item. The goal is to show minutes spent in either integer or float. However, the value in this column is not clean, see some example below. Is there a way to standardized and convert everything to an integer or float? (For example, 10 hours should be 600 minutes)

import pandas as pd
df1 = { 'min':['420','450','480','512','560','10 hours', '10.5 hours',
'420 (all inventory)','3h ', '4.1 hours', '60**','6h', '7hours  ']}

df1=pd.DataFrame(df1)

The desired output is like this

enter image description here

CodePudding user response:

I used regex for this kind of problem.

import regex as re
import numpy as np
import pandas as pd
df1 = { 'min':['420','450','480','512','560','10 hours', '10.5 hours',
'420 (all inventory)','3h ', '4.1 hours', '60**','6h', '7hours  ']}
df1=pd.DataFrame(df1)

# Copy Dataframe for iteration
# Created a empty  numpy array for parsing by index
arr = np.zeros(df1.shape[0])
df1_copy = df1.copy()

for i,j in df1_copy.iterrows():
    if "h" in j["min"]:
        j["min"] = re.sub(r"[a-zA-Z()\s]","",j["min"])
        j["min"] = float(j["min"])
        arr[i] = float(j["min"]*60)
    else:
        j["min"] = re.sub(r"[a-zA-Z()**\s]","",j["min"])
        j["min"] = float(j["min"])
        arr[i] = float(j["min"])


df1["min_clean"] = arr
print(df1)
                    min  min_clean
0                   420      420.0
1                   450      450.0
2                   480      480.0
3                   512      512.0
4                   560      560.0
5              10 hours      600.0
6            10.5 hours      630.0
7   420 (all inventory)      420.0
8                   3h       180.0
9             4.1 hours      246.0
10                 60**       60.0
11                   6h      360.0
12             7hours        420.0
  • Related