I have two dataframes, one is populated with barcodes and their product's prices across several supermarkets during the COVID pandemic, and the other one contains the actual product names and their barcodes along with a few category_id's etc, which are known (so I know which category_id to target in order to get a specific set of barcodes.)
the df_markets:
BARCODE AC BFRESH LIDL SUPERM
Date
2020-01-03 5201263086618 6.36 7.97 0 0 8.31
2020-01-03 5201263086625 7.58 9.53 0 0 9.91
2020-01-03 7322540574852 18.11 18.34 0 0 8.86
2020-01-03 7322540647136 18.8 18.95 0 0 18.9
2020-01-03 7322540587555 18.22 18.98 0 0 9.21
and the df_bar:
product_id category_id subcategory_name BARCODE name
0 1.0 37.0 Adult diapers 5.201263e 12 Sani Pants N2 Medium 14
1 2.0 37.0 Adult diapers 5.201263e 12 Sani Pants N3 Large 14
2 3.0 37.0 Adult diapers 7.322541e 12 Tena Pants Plus Large
3 4.0 37.0 Adult diapers 7.322541e 12 Tena Slip Super Large N4
4 5.0 37.0 Adult diapers 7.322541e 12 Tena Pants Plus Extra Large
What I've tried doing is first locate inside the df_bar all of the barcodes that have a category_id of 114.0 (which is of detergent) like that:
df_bar.loc[df_bar['category_id']==114.0]
and that produced a two-rows dataframe, so I know that this category_id exists, and then tried locating inside the df_markets dataframe all of the barcodes that much this search query:
df_markets.loc[df_markets['BARCODE'].isin(df_bar.loc[df_bar['category_id']==114.0])]
like that, but that produces an empty dataframe like that:
BARCODE AC BFRESH LIDL SUPERM
Date
so I am guessing the problem is that the df_markets has the barcodes in this format "5201263086618" while the df_bar has the barcodes in this format "5.201263e 12". Do you know what needs to be done to be able to search within the df_markets dataframe effectively? Or let me know if there's something wrong with my code.
A minor edit that helped me after Danila's suggestion was to turn my "query" into this :
df_bar['BARCODE'] = df_bar['BARCODE'].apply(lambda x: str(x)[:-2])
#above line changed the format
df_markets.loc[df_markets['BARCODE'].isin(df_bar.loc[df_bar['category_id']==114.0]['BARCODE'])]
so that it can take into consideration only the BARCODE column.
CodePudding user response:
just convert BARCODE column to string. example:
df_bar['BARCODE'] = df_bar['BARCODE'].apply(lambda x: str(x)[:-2])
CodePudding user response:
You can try going for a parse!
file = pandas.read_excel("your_file.xlsx", 0) # is the sheet index
for col, row in file.iterrows():
if row.get('category_id') == 114.0:
# do something
Note that this solution may not be optimized for a big dataframe, since it has to parse the entire dataset until it finds what you need
CodePudding user response:
Convert Barcode to similar format.
df_bar['BARCODE'] = int(float(df_bar['BARCODE']))
CodePudding user response:
Provided that the df_bar['BARCODE'] is a number in scientific notation you can use the following command before loading your dataset.
pd.set_option('display.float_format', lambda x: '%.3f' % x)