I have a dataframe like as shown below
ID,Region,Supplier,year,output
1,ANZ,AB,2021,1
2,ANZ,ABC,2022,1
3,ANZ,ABC,2022,1
4,ANZ,ABE,2021,0
5,ANZ,ABE,2021,1
6,ANZ,ABQ,2021,1
7,ANZ,ABW,2021,1
8,AUS,ABO,2020,1
9,KOREA,ABR,2019,0
data = pd.read_clipboard(sep=',')
My obejctive is to
a) Filter the dataframe by year>=2021
and output==1
b) Generate multiple csv files for each unique combination of region
and supplier
. For example, data for ANZ
and AB
should be stored in seperate file. Similarly, KOREA and ABR data should be stored in seperate file.. This has to be done for each unique combination of region and supplier
So, I tried the below
column_name = "region"
col_name = "supplier"
region_values = data[column_name].unique()
supplier_values = data[col_name].unique()
for i in itertools.zip_longest(region_values,supplier_values,fillvalue="ANZ"):
data_output = data.query(f"{column_name} == i[0] & Year>=2021 & output == 1 & {col_name} == i[1]")
output_path = ATTACHMENT_DIR / f"{i}_ge_2021.csv"
data_output.to_csv(output_path, index=False)
However, this results in error as shown below
KeyError: 'i'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\pandas\core\computation\scope.py in resolve(self, key, is_local) 205 # e.g., df[df > 0] --> 206 return self.temps[key] 207 except KeyError as err:
KeyError: 'i'
The above exception was the direct cause of the following exception:
UndefinedVariableError Traceback (most recent call last) C:\Users\aksha~1\AppData\Local\Temp/ipykernel_31264/2689222803.py in 1 for i in itertools.zip_longest(subregion_values,disti_values,fillvalue="ANZ"): ----> 2 data_output = data.query(f"{column_name} == i[0] & Year>=2021 & output == 1 & {col_name} == i
CodePudding user response:
Use
@
for pass variables toquery
, for columns names are correctf-string
s:#i, j are same like i[0], i[1] for i, j in itertools.zip_longest(region_values,supplier_values,fillvalue="ANZ"): data_output = data.query(f"{column_name} == @i & year>=2021 & output == 1 & {col_name} == @j")
Your solution also workign with
@
:for i in itertools.zip_longest(region_values,supplier_values,fillvalue="ANZ"): data_output = data.query(f"{column_name} == @i[0] & year>=2021 & output == 1 & {col_name} == @i[1]")
Also is possible use f-strings for variables, but need pass
repr
for representation ofi
variables:for i in itertools.zip_longest(region_values,supplier_values,fillvalue="ANZ"): data_output = data.query(f"{column_name} == {repr(i[0])} & year>=2021 & output == 1 & {col_name} == {repr(i[1])}")