When I implement code for one paricular value of state name (see Last_ residence in code)
andhrapradesh.query('Duration_of_residence=="All durations of residence" & Last_residence_R_or_U=="Urban" & Last_residence=="Jammu & Kashmir"',inplace=True)
print(andhrapradesh['Total_migrants'].sum())
It gives desired sum of outflow value for that state from pandas csv. But when I tried to calculate for all possible names of state it's giving me error "UndefinedVariableError: name 'Jammu & Kashmir' is not defined"
states = ["Jammu & Kashmir","Punjab",'Himachal Pradesh']
for name in states:
andhrapradesh.query(f'Duration_of_residence=="All durations of residence" & Last_residence_R_or_U=="Urban" & Last_residence=={name}',inplace=True)
print(andhrapradesh['Total_migrants'].sum())
can you please figure out why it showing error and how can I do it for all values in list states.
CodePudding user response:
You need quotes around name
like you had in your manual code:
df.query(f"... Last_residence == \"{name}\"")
so that it is seen as, e.g., "Punjab"
as a string to check against but not bare Punjab
which is sought specially, e.g., as a column name, hence the error.
(can use single quotes too...)
However, there's a better way: you can prefix a variable with @
symbol, so you don't even need the f-string:
df.query("... Last_residence == @name")