Home > Mobile >  Pandas: I want to replace value of column in dataframe based value of other column
Pandas: I want to replace value of column in dataframe based value of other column

Time:02-27

I am working on large dataset where I want to replace value of 1 column based on the value of another column. I have been trying different combinations, but not satisfied, is there a simple way like one liner? Sample code with error Solution:

import pandas as pd
people = pd.DataFrame(
    {
        "name": ["Ram", "Sham", "Ghanu", "Dhanu", "Jeetu"],
        "age": [25, 30, 25, 31, 31],
        "loc": ['Vashi', 'Nerul', 'Airoli', 'Panvel', 'CBD'],
    },)
print(people)

areacode = pd.DataFrame(
    {
        "loc": ['Vashi', 'Nerul', 'CBD', 'Panvel'],
        "pin": [400703, 400706, 421504, 410206],
    },)
print()
print(areacode)
people = pd.merge(people, areacode, how='left', on='loc').drop(columns='loc').fillna('')
people.rename(columns={'pin':'loc'}, inplace=True)
print(people)

output of people Dataframe before change:

    name  age     loc
0    Ram   25   Vashi
1   Sham   30   Nerul
2  Ghanu   25  Airoli
3  Dhanu   31  Panvel
4  Jeetu   31     CBD

output of areacode Dataframe:

      loc     pin
0   Vashi  400703
1   Nerul  400706
2     CBD  421504
3  Panvel  410206

output of people Dataframe after change:

name    age loc
0   Ram 25  400703.0
1   Sham    30  400706.0
2   Ghanu   25  
3   Dhanu   31  410206.0
4   Jeetu   31  421504.0

I don't like this approach as 1. Its long and 2. I am getting float in loc column, I need int. Please help me

CodePudding user response:

people = pd.DataFrame(
    {
        "name": ["Ram", "Sham", "Ghanu", "Dhanu", "Jeetu"],
        "age": [25, 30, 25, 31, 31],
        "loc": ['Vashi', 'Nerul', 'Airoli', 'Panvel', 'CBD'],
    },)
print(people)

areacode = pd.DataFrame(
    {
        "loc": ['Vashi', 'Nerul', 'CBD', 'Panvel'],
        "pin": [400703, 400706, 421504, 410206],
    },)
print()
print(areacode)

d = dict(zip(areacode["loc"], areacode["pin"]))
people["loc"] = people["loc"].apply(lambda x: int(d[x]) if x in d else "")
print(people)

CodePudding user response:

I see no issue with your appraoch. Just cast loc as integer.

Alternative would be map, but I suspect it would be slower. You still will cast loc as integer anyway

people=people.assign(loc=people['loc'].map(dict(zip(areacode['loc'],areacode['pin']))).fillna('0').astype(int))
  • Related