Home > OS >  Convert different units of a column in pandas
Convert different units of a column in pandas

Time:03-23

I'm working on a Kaggle project. Below is my CSV file column:

total_sqft
1056
1112
34.46Sq. Meter
4125Perch
1015 - 1540
34.46
10Sq. Yards
10Acres
10Guntha
10Grounds

The column is of type object. First I want to convert all the values to float then update the string 1015 - 1540 with its average value and finally convert the units to square feet. I've tried different StackOverflow solutions but none of them seems to work. Any help would be appreciated.

Expected Output:

total_sqft
1056.00
1112.00
370.307
1123031.25
1277.5
34.46
90.00
435600
10890
24003.5

1 square meter = 10.764 * square foot
1 perch = 272.25 * square foot 
1 square yards = 9 * square foot   
1 acres = 43560 * square foot
1 guntha = 1089 * square foot
1 grounds = 2400.35 * square foot

CodePudding user response:

First extract numeric values by Series.str.extractall, convert to floats and get averages:

df['avg'] = (df['total_sqft'].str.extractall(r'(\d \.*\d*)')
                             .astype(float)
                             .groupby(level=0)
                             .mean())
print (df)
       total_sqft      avg
0            1056  1056.00
1            1112  1112.00
2  34.46Sq. Meter    34.46
3       4125Perch  4125.00
4     1015 - 1540  1277.50
5           34.46    34.46

then need more information for convert to square feet.

EDIT: Create dictionary for match units, extract them from column and use map, last multiple columns:

d = {'Sq. Meter': 10.764, 'Perch':272.25, 'Sq. Yards':9,
     'Acres':43560,'Guntha':1089,'Grounds':2400.35}

df['avg'] = (df['total_sqft'].str.extractall(r'(\d \.*\d*)')
                             .astype(float)
                             .groupby(level=0)
                             .mean())

df['unit'] = df['total_sqft'].str.extract(f'({"|".join(d)})', expand=False)

df['map'] = df['unit'].map(d).fillna(1)

df['total_sqft'] = df['avg'].mul(df['map'])
print (df)
     total_sqft      avg       unit        map
0  1.056000e 03  1056.00        NaN      1.000
1  1.112000e 03  1112.00        NaN      1.000
2  3.709274e 02    34.46  Sq. Meter     10.764
3  1.123031e 06  4125.00      Perch    272.250
4  1.277500e 03  1277.50        NaN      1.000
5  3.446000e 01    34.46        NaN      1.000
6  9.000000e 01    10.00  Sq. Yards      9.000
7  4.356000e 05    10.00      Acres  43560.000
8  1.089000e 04    10.00     Guntha   1089.000
9  2.400350e 04    10.00    Grounds   2400.350
  • Related