Home > Software design >  Converting time strings to integers in python
Converting time strings to integers in python

Time:10-16

I have a column of string values in my pandas dataframe called 'travel_time' with values like:

1 hour 10 mins
34 mins
58 mins
1 hour 32 mins
12 mins

I would like to make a new column that converts these strings to minutes (integers) so that I can do calculations (average, min, max, binning, etc.) So for example '1 hour 10 mins' becomes 70, '34 mins' becomes 34, '58 mins' becomes 58, '1 hour 32 mins' becomes 92, '12 mins' becomes 12

I know there are functions in python that will allow me to remove non-numerical values from the strings but I'm not sure how to handle cases where travel_time is greater than 60 minutes. Any advice on how I could do this?

CodePudding user response:

You can use df.applymap to apply custom functions on your dataframe.

import pandas as pd

df = pd.DataFrame(['1 hour 10 mins', '34 mins', '58 mins', '1 hour 32 mins', '12 mins'])

timemap = {'mins': 1, 'hour': 60}  # Express time units in minutes. Add as needed.

def transform(s):
    n = 0
    count = {}

    # Split string by space and parse tokens.
    for tok in s.split():
        if tok in timemap:  # Token is a time unit.
            count[tok] = n
        else:
            try:  # Token is an integer?
                n = int(tok)
            except ValueError:  # Nope, not an integer. :(
                raise RuntimeError(f'unknown token: {tok}')

    # Add total.
    return sum(timemap[t] * val for t, val in count.items())


print(df.applymap(transform))

Output:

    0
0  70
1  34
2  58
3  92
4  12

If you want to apply the function to a specific column, then use df['the_column'].apply(transform).

CodePudding user response:

Here is another answer where we use regex. But it will be less efficient compared to TrebledJ's answer:

import pandas as pd
import re

df = pd.DataFrame({
    'travel_time': [
        '1 hour 10 mins',
        '34 mins',
        '58 mins',
        '1 hour 32 mins',
        '12 mins'
    ]
})

def timeCleanup(time_value):
    hours = '0'
    minutes = '0'
    # parse hours
    match = re.search(r'\d \s*h', time_value)
    if match:
        hours = re.search(r'\d ', match.group()).group()
    # parse minutes
    match = re.search(r'\d \s*m', time_value)
    if match:
        minutes = re.search(r'\d ', match.group()).group()
    # returns hours * 60   minutes 
    return int(hours) * 60   int(minutes)

df = df['travel_time'].apply(lambda x: timeCleanup(x))

print(df)
  • Related