I have a data of two column like this
data.head()
GroupId Planet
0 0008 Europa
1 0008 Europa
2 0009 Mars
3 0010 Earth
4 0011 Earth
5 0012 Earth
6 0012 NaN
Planet column has missing values but same groupid shares same planet. How can i create dictionary for mapping to fill na like { '0008' : 'Europa', '0009' : 'Mars ...}
CodePudding user response:
You can remove NaN
values and do dictionary mapping.
data = data.dropna()
final_df = {k:v for k,v in list(set(zip(data['GroupId'], data['Planet'])))}
final_df
CodePudding user response:
df.drop_duplicates(subset=['GroupId'], keep='first').set_index('GroupId').to_dict()['Planet']
CodePudding user response:
Here's what you need, I have added sample dataframe with NaN
as per your description -
import numpy as np
import pandas as pd
# dataframe
planet_data = {
"GroupId": ["0008", "0008", "0009", "0010", "0008", "0011", "0012"],
"Planet": ["Europa", "Europa", "Mars", "Earth", np.nan, "Earth", "Earth"]
}
data = pd.DataFrame(planet_data)
# create a dictionary
df = data.copy(deep=True)
df = df.dropna()
df = df.drop_duplicates()
planet_map = dict(zip(df["GroupId"], df["Planet"]))
# replace missing values using dictionary
data["Planet"] = data["GroupId"].apply(lambda x: planet_map[x])
data