i have a question. So i am learning pandas. i have uncleared data and I am trying to clean it. so one column in the dataframe contains strings and numbers together. The column is about the area of homes. But the measurement unit of some homes is squared meter, some are sot (1 sot is 100m2 ). I wanna delete m2 and sot from the column and multiply rows that the unit of measurement is sot. I could remove m2 and sot by means of replace method. but I am not sure how I could multiply rows which have had sot.
0 90 m²
1 102 m²
2 43 m²
3 240 m²
4 120 m²
...
15825 11 sot
15826 4 sot
15827 100 sot
15828 3 sot
15829 24 sot
Name: Sahə, Length: 15830, dtype: object
CodePudding user response:
Use str.extract
and a mapping dictionary for conversion factors:
convert = {'sot': 100}
df2 = df['Saha'].str.extract(r'(\d )\s*(.*)')
df['surf_m2'] = pd.to_numeric(df2[0]).mul(df2[1].map(convert).fillna(1))
output:
Saha surf_m2
0 90 m² 90.0
1 102 m² 102.0
2 43 m² 43.0
3 240 m² 240.0
4 120 m² 120.0
...
15825 11 sot 1100.0
15826 4 sot 400.0
15827 100 sot 10000.0
15828 3 sot 300.0
15829 24 sot 2400.0
intermediate df2
:
0 1
0 90 m²
1 102 m²
2 43 m²
3 240 m²
4 120 m²
...
15825 11 sot
15826 4 sot
15827 100 sot
15828 3 sot
15829 24 sot
CodePudding user response:
Assuming your dataframe is df
, the following will work:
# create an indicator column
df['m'] = df['unit'].apply(lambda x: 'm' if 'm' in x else 'sot')
# turn strings into numbers (assuming your column is called unit)
df['unit'] = df['unit'].str.replace('m²', '').str.replace(' sot', '').str.strip().astype(float)
# finally multiply where necessary
df.loc[df['m'] == 'sot', 'unit'] = df.loc[df['m'] == 'sot', 'unit'] * 100
Giving you:
unit m
0 102.0 m
1 43.0 m
2 240.0 m
3 120.0 m
5 1100.0 sot
6 400.0 sot
7 10000.0 sot
8 300.0 sot
9 2400.0 sot
CodePudding user response:
df = df['ColName'].str.split(" ", expand=True)
df = df.rename(columns={0: "ColName", 1: "Type"})
df.ColName = df.ColName.astype(float)
df.loc[df.Type == 'sot', 'ColName'] *= 100
df
output:
ColName Type
0 90.0 m²
1 102.0 m²
2 43.0 m²
3 240.0 m²
4 1100.0 sot
5 400.0 sot
6 10000.0 sot
7 300.0 sot
8 2400.0 sot