Home > Net >  How to update column A value with column B value based on column B's string length property?
How to update column A value with column B value based on column B's string length property?

Time:12-27

I scraped a real estate website and produced a CSV output with data requiring to be cleaned and structured. So far, my code properly organized and reformatted the data for it to work with stats software.

However, every row and then, my 'Gross area' column has the wrong value in m2. The correct value appears in another column ('Furbished').

Gross_area Furbished
170 #erroneous 190 m2
170 #erroneous 190 m2
160 #correct Yes
155 #correct No

I tried using the np.where function. However, I could not specify the condition based on string length, which would allow me to target all '_ _ _ m2' values in column 'Furbished' and reinsert them in 'Gross_area'. It just doesn't work.

df['Gross area']=np.where(len(df['Furbished]) == 6, df['Furbished'],df['Gross area'] 

As an alternative, I tried setting cumulative conditions to precisely target my '_ _ _ m2' values and insert them in my 'Gross area' column. It does not work:

df['Gross area']=np.where((df['Furbished]) != 'Yes' or 'No', df['Furbished'],df['Gross area'] 

The outcome I seek is:

Gross_area Furbished
190 m2 190 m2
190 m2 190m2
160 Yes

Any suggestions? Column Furbished string length criterion would be the best option, as I have other instances that would require the same treatment :)

Thanks in advance for your help!

CodePudding user response:

There is probably a better way to do this, but you could get the intended effect by a simple df.apply() function.

df['Gross area'] = df.apply(lambda row: row['Furbished'] if len(row['Furbished']) == 6 else df['Gross area'], axis=1)

With a simple change, you can also keep the 'Gross area' column in the right type.

df['Gross area'] = df.apply(lambda row: float(row['Furbished'][:-2]) if len(row['Furbished']) == 6 else df['Gross area'], axis=1)

CodePudding user response:

You can use pd.where:

df['Gross_area'] = df['Furbished'].where(df['Furbished'].str.len() == 6, df['Gross_area'])

This tells you to use the value in the Furbished column if its length is 6, otherwise use the value in the Gross_area column.

Result:

     Gross_area Furbished
0        190 m2    190 m2
1        190 m2    190 m2
2  160 #correct       Yes
3  155 #correct        No
  • Related