I have the two following dataframes (df1 and df2).
df1:
code name region
0 AFG Afghanistan Middle East
1 NLD Netherlands Western Europe
2 AUT Austria Western Europe
3 IRQ Iraq Middle East
4 USA United States North America
5 CAD Canada North America
df2:
code year gdp per capita
0 AFG 2010 547.35
1 NLD 2010 44851.27
2 AUT 2010 3577.10
3 IRQ 2010 4052.06
4 USA 2010 52760.00
5 CAD 2010 41155.32
6 AFG 2015 578.47
7 NLD 2015 45175.23
8 AUT 2015 3952.80
9 IRQ 2015 4688.32
10 USA 2015 56863.37
11 CAD 2015 43635.10
Instead of merging the two dataframes, I would like to add the respective region from df1 as a new column to df2, using either iterrows() or a for loop.
When I call
for i in range(len(df2)):
region = df1.loc[(df1["code"] == df2.loc[i, "code"]), "region"]
df2.loc[i, "region"] = region
or
for index, row in df2.iterrows():
region = df1.loc[df1["code"] == row["code"], "region"]
df2.loc[index, "region"] = region
I get the error message "ValueError: Incompatible indexer with Series". I think it has to do with the last .loc call df2.loc[i, "region"] = region
and df2.loc[index, "region"] = region
, because I am using a number and a string at the same time. But I have also done that for calculating region
in the for loop and there is no error message.
Your help would be appreciated.
CodePudding user response:
Because region
is a Series
(not a scalar, for example), pandas thinks you want to assign it by index to df2.loc[i, "region"]
and this causes the error.
In fact, you are looking to get at the singleton item in the series and assign that. You can achieve this by doing the following after initializing region
: region = region.to_list()[0]
.
Here is a modified version of your code which should work:
for i in range(len(df2)):
region = (df1.loc[(df1["code"] == df2.loc[i, "code"]), "region"]).to_list()[0]
df2.loc[i, "region"] = region
Output:
code year gdp per capita region
0 AFG 2010 99.0 Middle East
1 NLD 2010 99.0 Western Europe
2 AUT 2010 99.0 Western Europe
3 IRQ 2010 99.0 Middle East
4 USA 2010 99.0 North America
5 CAD 2010 99.0 North America
6 AFG 2015 99.0 Middle East
7 NLD 2015 99.0 Western Europe
8 AUT 2015 99.0 Western Europe
9 IRQ 2015 99.0 Middle East
10 USA 2015 99.0 North America
11 CAD 2015 99.0 North America
Because you have said you do not want to "merge" the dataframes, I assume you are trying to avoid both merge()
and join()
, otherwise I would have suggested the following:
df2 = df2.join(df1.drop(columns='name').set_index('code'), on='code')