Home > Software design >  How do I fillna using data from left column as the reference
How do I fillna using data from left column as the reference

Time:07-06

Id like to ask for help in fixing the missing values in pandas dataframe (python)

here is the dataset dataset

In this dataset I found a missing value in ['Item_Weight'] column.

I don't want to drop the missing values because I found out by sorting them. the missing value is "miss type" by someone who encoded it.

here is the sorted dataset sorted_dataset

Now I created a lookup dataset so I can merge them to fill na missing values.

Lookup Table

How can I merge them or join them only to fill the missing values (Nan) using the lookup table I made? Or is there any other way without using a lookup table?

CodePudding user response:

Looking at this you will probably want to use something along the lines of map instead of join/merge this is an example of how to use map with your data.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Column1' : ['A', 'B', 'C'],
    'Column2' : [1, np.nan, 3]
})
df

df_map = pd.DataFrame({
    'Column1' : ['A', 'B', 'C'],
    'Column2' : [1, 2, 3]
})

df_map

#Looks to find where the column you specify is null, then using your map df will map the value from column1 to column2
df['Column2'] = np.where(df['Column2'].isna(), df['Column1'].map(df_map.set_index('Column1')['Column2']), df['Column2'])

I had to create my own dataframes since you used screenshots. In the future, the use of screenshots is not considered best to help developers with assistance.

CodePudding user response:

This will probably work:

df = df.sort_values(['Item_Identifier', 'Item_Weight']).ffill()

But I can't test it since you didn't give us anything to work with.

  • Related