Home > Mobile >  removing spesific words from a column and multiplying spesific rows to a number which contains words
removing spesific words from a column and multiplying spesific rows to a number which contains words

Time:08-26

i have a question. So i am learning pandas. i have uncleared data and I am trying to clean it. so one column in the dataframe contains strings and numbers together. The column is about the area of homes. But the measurement unit of some homes is squared meter, some are sot (1 sot is 100m2 ). I wanna delete m2 and sot from the column and multiply rows that the unit of measurement is sot. I could remove m2 and sot by means of replace method. but I am not sure how I could multiply rows which have had sot.

0          90 m²
1         102 m²
2          43 m²
3         240 m²
4         120 m²
          ...   
15825     11 sot
15826      4 sot
15827    100 sot
15828      3 sot
15829     24 sot
Name: Sahə, Length: 15830, dtype: object

CodePudding user response:

Use str.extract and a mapping dictionary for conversion factors:

convert = {'sot': 100}

df2 = df['Saha'].str.extract(r'(\d )\s*(.*)')
df['surf_m2'] = pd.to_numeric(df2[0]).mul(df2[1].map(convert).fillna(1))

output:

          Saha  surf_m2
0        90 m²     90.0
1       102 m²    102.0
2        43 m²     43.0
3       240 m²    240.0
4       120 m²    120.0
...
15825   11 sot   1100.0
15826    4 sot    400.0
15827  100 sot  10000.0
15828    3 sot    300.0
15829   24 sot   2400.0

intermediate df2:

         0    1
0       90   m²
1      102   m²
2       43   m²
3      240   m²
4      120   m²
...
15825   11  sot
15826    4  sot
15827  100  sot
15828    3  sot
15829   24  sot

CodePudding user response:

Assuming your dataframe is df, the following will work:

# create an indicator column
df['m'] = df['unit'].apply(lambda x: 'm' if 'm' in x else 'sot')

# turn strings into numbers (assuming your column is called unit)
df['unit'] = df['unit'].str.replace('m²', '').str.replace(' sot', '').str.strip().astype(float)

# finally multiply where necessary
df.loc[df['m'] == 'sot', 'unit'] = df.loc[df['m'] == 'sot', 'unit'] * 100

Giving you:

    unit    m
0   102.0   m
1   43.0    m
2   240.0   m
3   120.0   m
5   1100.0  sot
6   400.0   sot
7   10000.0 sot
8   300.0   sot
9   2400.0  sot

CodePudding user response:

df = df['ColName'].str.split(" ", expand=True)
df = df.rename(columns={0: "ColName", 1: "Type"})
df.ColName = df.ColName.astype(float)

df.loc[df.Type == 'sot', 'ColName'] *= 100
df

output:

    ColName Type
0   90.0    m²
1   102.0   m²
2   43.0    m²
3   240.0   m²
4   1100.0  sot
5   400.0   sot
6   10000.0 sot
7   300.0   sot
8   2400.0  sot
  • Related