Home > Enterprise >  Converting m to km and string to float in pandas DataFrame
Converting m to km and string to float in pandas DataFrame

Time:04-30

I have this simplified DataFrame where I want to add a new column Distance_km. In this new column all values should be in kilometres and converted to float dtype.

d = {'Point': ['a','b','c','d'], 'Distance': ['3km', '400m','1.1km','200m']}
dist=pd.DataFrame(data=d)
dist
    
  Point Distance
0   a    3km
1   b    400m
2   c    1.1km
3   d    200m

Point       object
Distance    object
dtype: object

How can I get this output?

    Point   Distance    Distance_km
0    a       3.8km          3.8
1    b       400m           0.4
2    c       1.1km          1.1
3    d       200m           0.2

Point           object
Distance        object
Distance_km    float64
dtype: object

Thanks in advance!

CodePudding user response:

Try:

# An "Weight" column marking those are in "m" units    
dist["Weight"] = 1e-3
dist.loc[dist["Distance"].str.contains("km"),"Weight"] = 1

# Extract the numeric part of string and convert it to float
dist["NumericPart"] = dist["Distance"].str.extract("([0-9.] )\w ").astype(float)

# Merge the numeric parts with their units(weights) by multiplication
dist["Distance_km"] = dist["NumericPart"] * dist["Weight"]

You will get:

  Point Distance  Weight  NumericPart  Distance_km
0     a      3km   1.000          3.0          3.0
1     b     400m   0.001        400.0          0.4
2     c    1.1km   1.000          1.1          1.1
3     d     200m   0.001        200.0          0.2

BTW: You may like to use this instead of the second line above to guarantee the "km" str is indeed at the end of the string, just in case.

dist.loc[dist["Distance"].str.contains("km^",regex=True),"Weight"] = 1

CodePudding user response:

You could use Pandas apply method to pass your distance column values to a function that converts it to a standardized unit like so

From the documentation

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

First create the function that will transform the data, apply can even take in a lambda

import re

def convert_to_km(distance):
    '''
    distance can be a string with km or m as units
    e.g. 300km, 1.1km, 200m, 4.5m
    '''
    
    # split the string into value and unit ['300', 'km']
    split_dist = re.match('([\d\.] )?([a-zA-Z] )', distance)
    
    value = split_dist.group(1) # 300
    unit = split_dist.group(2)  # km
    
    if unit == 'km':
        return float(value)
    if unit == 'm':
        return round(float(value)/1000, 2)
   
d = {'Point': ['a','b','c','d'], 'Distance': ['3km', '400m','1.1km','200m']}
dist=pd.DataFrame(data=d)

You can then apply this funtion to your distance column

dist['Distanc_km'] = dist.apply(lambda row: convert_to_km(row['Distance']), axis=1)

dist

The output will be

    Point   Distance    Distanc_km
0   a            3km    3.0
1   b           400m    0.4
2   c          1.1km    1.1
3   d           200m    0.2
  • Related