I am pretty new to python programming and I want to use Pandas .loc to filter data using a certain column according to the length of a products' list.
I have the (condensed) list:
products_selected = ["Apple", "Banana", "Strawberry"]
I produced the following very poor code (condensed) to reach my goal for the column PRODUCT in the Pandas DataFrame:
if len(products_selected) == 1:
data = data.loc[(data.PRODUCT == products_selected[0])]
elif len(products_selected) == 2:
data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1])]
elif len(products_selected) == 3:
data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1]) | (data.PRODUCT == products_selected[2])]
How can I do this the pythonic way?
And what's more - independent to the length of the list without having to expand my poor coding manually?
I can't use reduce() or something like that, it should be done without additional functions except pandas or numpy.
It looks so easy but due to my limited coding experience I didn't manage it.
CodePudding user response:
Slicing may help.
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels
[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})
products_selected = ["Apple"]
print(data[:len(products_selected)])
products_selected = ["Apple", "Banana"]
print(data[:len(products_selected)])
products_selected = ["Apple", "Banana", "Strawberry"]
print(data[:len(products_selected)])
[out]:
PRODUCT
0 Apple
PRODUCT
0 Apple
1 Banana
PRODUCT
0 Apple
1 Banana
2 Strawberry
... but it makes no assumption about returning the same values from data
that are in products_selected
.
For example:
[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})
products_selected = ["Apple", "Strawberry"]
data[:len(products_selected)]
[out]:
PRODUCT
0 Apple
1 Banana
There is also a reference in your example to countries_selected
. I made the assumption that it was a typo and should have also been products_selected
.