I'm working on a Kaggle project. Below is my CSV file column:
total_sqft
1056
1112
34.46Sq. Meter
4125Perch
1015 - 1540
34.46
10Sq. Yards
10Acres
10Guntha
10Grounds
The column is of type object. First I want to convert all the values to float then update the string 1015 - 1540
with its average value and finally convert the units to square feet. I've tried different StackOverflow solutions but none of them seems to work. Any help would be appreciated.
Expected Output:
total_sqft
1056.00
1112.00
370.307
1123031.25
1277.5
34.46
90.00
435600
10890
24003.5
1 square meter = 10.764 * square foot
1 perch = 272.25 * square foot
1 square yards = 9 * square foot
1 acres = 43560 * square foot
1 guntha = 1089 * square foot
1 grounds = 2400.35 * square foot
CodePudding user response:
First extract numeric values by Series.str.extractall
, convert to floats and get averages:
df['avg'] = (df['total_sqft'].str.extractall(r'(\d \.*\d*)')
.astype(float)
.groupby(level=0)
.mean())
print (df)
total_sqft avg
0 1056 1056.00
1 1112 1112.00
2 34.46Sq. Meter 34.46
3 4125Perch 4125.00
4 1015 - 1540 1277.50
5 34.46 34.46
then need more information for convert to square feet
.
EDIT: Create dictionary for match units, extract them from column and use map
, last multiple columns:
d = {'Sq. Meter': 10.764, 'Perch':272.25, 'Sq. Yards':9,
'Acres':43560,'Guntha':1089,'Grounds':2400.35}
df['avg'] = (df['total_sqft'].str.extractall(r'(\d \.*\d*)')
.astype(float)
.groupby(level=0)
.mean())
df['unit'] = df['total_sqft'].str.extract(f'({"|".join(d)})', expand=False)
df['map'] = df['unit'].map(d).fillna(1)
df['total_sqft'] = df['avg'].mul(df['map'])
print (df)
total_sqft avg unit map
0 1.056000e 03 1056.00 NaN 1.000
1 1.112000e 03 1112.00 NaN 1.000
2 3.709274e 02 34.46 Sq. Meter 10.764
3 1.123031e 06 4125.00 Perch 272.250
4 1.277500e 03 1277.50 NaN 1.000
5 3.446000e 01 34.46 NaN 1.000
6 9.000000e 01 10.00 Sq. Yards 9.000
7 4.356000e 05 10.00 Acres 43560.000
8 1.089000e 04 10.00 Guntha 1089.000
9 2.400350e 04 10.00 Grounds 2400.350