How to add values of two column to dictionary for mapping-CodePudding

I have a data of two column like this

data.head()
   GroupId  Planet
0    0008   Europa
1    0008   Europa
2    0009   Mars
3    0010   Earth
4    0011   Earth
5    0012   Earth
6    0012   NaN

Planet column has missing values but same groupid shares same planet. How can i create dictionary for mapping to fill na like { '0008' : 'Europa', '0009' : 'Mars ...}

CodePudding user response：

You can remove NaN values and do dictionary mapping.

data = data.dropna()
final_df = {k:v for k,v in list(set(zip(data['GroupId'], data['Planet'])))}
final_df

CodePudding user response：

df.drop_duplicates(subset=['GroupId'], keep='first').set_index('GroupId').to_dict()['Planet']

CodePudding user response：

Here's what you need, I have added sample dataframe with NaN as per your description -

import numpy as np
import pandas as pd

# dataframe
planet_data = {
    "GroupId": ["0008", "0008", "0009", "0010", "0008", "0011", "0012"],
    "Planet":  ["Europa", "Europa", "Mars", "Earth", np.nan, "Earth", "Earth"]
}

data = pd.DataFrame(planet_data)

# create a dictionary
df = data.copy(deep=True)
df = df.dropna()
df = df.drop_duplicates()
planet_map = dict(zip(df["GroupId"], df["Planet"]))

# replace missing values using dictionary
data["Planet"] = data["GroupId"].apply(lambda x: planet_map[x])
data