Home > Enterprise >  Overwriting pandas columns in place
Overwriting pandas columns in place

Time:06-25

I have a dataframe with address data. The fields are add_1, add_2, add_3.

The data is messy and needs to be cleaned up. On some rows the add_1 isn't populated but add_2 and add_3 are. I need to shunt them over with the logic 'if 2 is blank and 3 is populated then move 3 into 2 and delete from 3'

So far, I've been able to get it to move the contents from column 3 into 2 if it's blank, but the solution I have just creates a new object (?) with the updated values for 2, it doesn't inherit the values where 2 was already populated so I end up with the same problem in reverse.

My first thought was to create a new dataframe and then merge the two on the index column, but I can't get it to create the new df including the index (and other) columns from the original.

The dataframe is created from a csv file with the headings as above, I assigned the index to be the 'ID' column and then did this:

arr_ad1 = df_raw_data['ID', 'add_1', 'add_2'] = np.where(df_raw_data['add_1'] is None,
                                                                     df_raw_data['add_'], df_raw_data['add_2'])

If I create a new df from the arr_ad1 and print it, it only shows the add_1 column, with the correct values from add_2 where the rows were blank, but then blank values elsewhere, where it hasn't been updated.

The best (I think?) would be to include inplace=True and just have it overwrite the blank cells, but if I could get the result to return the index too I could merge them. At the moment I'm stuck with a useless output that I can't use to update the original.

As you can probably tell, I'm pretty new to this.

CodePudding user response:

If I understand correctly, your question summed up to: "how to replace NaN value in one columns with values from other column"?

if so, fillna is your friend: you can use to fill only the places with nans, and you can send it another column. It take care both for "only fill nans values" and also "takes values from column add_3"

df['add_2]'.fillna(df['add_3'], inplace=True)

and for deleting the last column (if wanted) you can just use .drop:

df.drop(labels='add_3`, axis=1, inplace=True)
  • Related