I am importing excel files with products and product specific data. They look like this:
dfA
EAN Code Product Name Color Price
12345 AAA xxx 9
45678 BBB zzz 10
and dfB
EAN Code Product Name New Price
12-345 AAA 10
45-678 BBB 11
I am importing these as always:
dfA = pd.DataFrame (dfA, columns=["Season", "EAN Code", "Product Name", "..."] , dtype=str)
And I am merging them, since there are multiple different excel files:
dfA = pd.concat([dfA1, dfA2, dfA3, dfA4, dfA5, dfA6, dfA7, dfA8, dfA9, dfA10, dfA11], ignore_index=False)
Then I am deleting the hyphen in the EAN column, because in excel file b there is an unnecessary hyphen in the EAN number.
for col in dfB.columns:
dfB["EAN"] = dfB["EAN"].str.replace('-', '')
So far, so good.
Now I try to search through the EAN code column of dfA and search for the same product in dfB. When there is a match, I want to copy over the new price. This worked great in my old code, although it took about 15 minutes for the script to search through a half million rows... This worked quite well in the past, but now I am trying to achieve the same, but I get an error message. This is my simplified code:
for i in dfA.index:
for j in dfB.index:
if dfA.loc[i, "EAN"] == dfB.loc[j, "EAN"]:
print ("EAN", number,"times found!")
number=number 1
dfA.loc[i, "Price"] = dfB.loc[j, "New Price"]
The print statement is only a feedback for me, so that I see wether the script is still doing something.
Traceback (most recent call last):
File "c:...", line 72, in <module>
if dfA.loc[i, "EAN"] == dfB.loc[j, "EAN"]:
File "C:...", line 1527, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Why am I getting this now?
CodePudding user response:
Use a merge
, after setting up a common EAN Code:
out = dfA.merge(
dfB.assign(**{'EAN Code': dfB['EAN Code'].str.replace('-', '')
.astype(int) # only if dfA has an int
}),
on='EAN Code')
output:
EAN Code Product Name_x Color Price Product Name_y New Price
0 12345 AAA xxx 9 AAA 10
1 45678 BBB zzz 10 BBB 11
CodePudding user response:
Your code is trying to test a condition somewhere, it's obviously in the if
-part: if dfA.loc[i, "EAN"] == dfB.loc[j, "EAN"]
.
dfA.loc[i, "EAN"]
happens to be a Series sometimes and/or dfB.loc[j, "EAN"]
too.
So the result of ... == ...
is a Series, but if
expects a boolean. You need to reduce this Series to a boolean using the appropriate method (they propose empty
, bool()
, any()
, etc.)
I assume that you expect dfA.loc[i, "EAN"]
to give you a scalar and not a Series but it's not always the case (idem for dfB.loc[j, "EAN"]
). So check your data. How are your i
and j
s?
Remark
I recommend you to use df.at[i, col]
rather than df.loc[i, col]
, when you know that you should get a scalar. It's faster and will send errors if you can't get a scalar, so it will be easier to debug.