I am working on large dataset where I want to replace value of 1 column based on the value of another column. I have been trying different combinations, but not satisfied, is there a simple way like one liner? Sample code with error Solution:
import pandas as pd
people = pd.DataFrame(
{
"name": ["Ram", "Sham", "Ghanu", "Dhanu", "Jeetu"],
"age": [25, 30, 25, 31, 31],
"loc": ['Vashi', 'Nerul', 'Airoli', 'Panvel', 'CBD'],
},)
print(people)
areacode = pd.DataFrame(
{
"loc": ['Vashi', 'Nerul', 'CBD', 'Panvel'],
"pin": [400703, 400706, 421504, 410206],
},)
print()
print(areacode)
people = pd.merge(people, areacode, how='left', on='loc').drop(columns='loc').fillna('')
people.rename(columns={'pin':'loc'}, inplace=True)
print(people)
output of people Dataframe before change:
name age loc
0 Ram 25 Vashi
1 Sham 30 Nerul
2 Ghanu 25 Airoli
3 Dhanu 31 Panvel
4 Jeetu 31 CBD
output of areacode Dataframe:
loc pin
0 Vashi 400703
1 Nerul 400706
2 CBD 421504
3 Panvel 410206
output of people Dataframe after change:
name age loc
0 Ram 25 400703.0
1 Sham 30 400706.0
2 Ghanu 25
3 Dhanu 31 410206.0
4 Jeetu 31 421504.0
I don't like this approach as 1. Its long and 2. I am getting float in loc column, I need int. Please help me
CodePudding user response:
people = pd.DataFrame(
{
"name": ["Ram", "Sham", "Ghanu", "Dhanu", "Jeetu"],
"age": [25, 30, 25, 31, 31],
"loc": ['Vashi', 'Nerul', 'Airoli', 'Panvel', 'CBD'],
},)
print(people)
areacode = pd.DataFrame(
{
"loc": ['Vashi', 'Nerul', 'CBD', 'Panvel'],
"pin": [400703, 400706, 421504, 410206],
},)
print()
print(areacode)
d = dict(zip(areacode["loc"], areacode["pin"]))
people["loc"] = people["loc"].apply(lambda x: int(d[x]) if x in d else "")
print(people)
CodePudding user response:
I see no issue with your appraoch. Just cast loc as integer.
Alternative would be map, but I suspect it would be slower. You still will cast loc as integer anyway
people=people.assign(loc=people['loc'].map(dict(zip(areacode['loc'],areacode['pin']))).fillna('0').astype(int))