I have a pandas dataframe with zipcodes. I also have a dictionary where keys = zipkode and values = regions
The dictionary
my_regions = {8361: 'Central region', 8381: 'Central region', 8462: 'North region', 8520: 'South region', 8530: 'Central region', 8541: 'South region'}
The dataframe has a col name
df["zipcode"]= [8462, 8361, 8381,8660,8530,8530]
I want to add a new col to the dataframe df with the dict values (region name), when the loop sees that zip code in dataframe is == to zipkode in dict.keys
I have tried this
my_regions_list = []
for keyname in my_regions:
for zipcode in df.zipcode:
if zipcode == my_regions.keys():
my_regions_list.append(my_regions.values())
# df["region"] = df.append(my_regions.values())
df =df.insert(column="region", value = my_r)
The list is empty and this is not adding the new row to the existing dataframe...
I also tried to convert it to a dataframe but it makes no sense
df1 = pd.DataFrame(list(my_regions.items()),columns = ['ZIPCODE','REGIONNAME'])
CodePudding user response:
You can use .map()
:
df = pd.DataFrame({"zipcode": [8462, 8361, 8381, 8660, 8530, 8530]})
my_regions = {
8361: "Central region",
8381: "Central region",
8462: "North region",
8520: "South region",
8530: "Central region",
8541: "South region",
}
df["name"] = df["zipcode"].map(my_regions)
print(df)
Prints:
zipcode name
0 8462 North region
1 8361 Central region
2 8381 Central region
3 8660 NaN
4 8530 Central region
5 8530 Central region
CodePudding user response:
I'm going to keep this up, even though .map()
is supposedly faster than .replace()
simply because the results are different, and others may find one or the other more appropriate for their use-case.
Note that the main difference is that .replace()
will leave the original value intact if no mapping was found, whereas .map()
produces NaN
for mappings that don't exist.
df["regions"] = df["zipcode"].replace(my_regions)
Demo:
In [5]: df
Out[5]:
zipcode
0 8462
1 8361
2 8381
3 8660
4 8530
5 8530
In [6]: df["regions"] = df["zipcode"].replace(my_regions)
In [7]: df
Out[7]:
zipcode regions
0 8462 North region
1 8361 Central region
2 8381 Central region
3 8660 8660
4 8530 Central region
5 8530 Central region