Home > Blockchain >  Build expression for pandas loc to filter data using a certain column according to length of a list
Build expression for pandas loc to filter data using a certain column according to length of a list

Time:12-18

I am pretty new to python programming and I want to use Pandas .loc to filter data using a certain column according to the length of a products' list.

I have the (condensed) list:

products_selected = ["Apple", "Banana", "Strawberry"]

I produced the following very poor code (condensed) to reach my goal for the column PRODUCT in the Pandas DataFrame:

if len(products_selected) == 1:
    data = data.loc[(data.PRODUCT == products_selected[0])]
   
elif len(products_selected) == 2:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1])]

elif len(products_selected) == 3:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1]) | (data.PRODUCT == products_selected[2])]

How can I do this the pythonic way?

And what's more - independent to the length of the list without having to expand my poor coding manually?

I can't use reduce() or something like that, it should be done without additional functions except pandas or numpy.

It looks so easy but due to my limited coding experience I didn't manage it.

CodePudding user response:

Slicing may help.

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels

[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})

products_selected = ["Apple"]
print(data[:len(products_selected)])

products_selected = ["Apple", "Banana"]
print(data[:len(products_selected)])

products_selected = ["Apple", "Banana", "Strawberry"]
print(data[:len(products_selected)])

[out]:
  PRODUCT
0   Apple
  PRODUCT
0   Apple
1  Banana
      PRODUCT
0       Apple
1      Banana
2  Strawberry

... but it makes no assumption about returning the same values from data that are in products_selected.

For example:

[in]:
import pandas
data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})

products_selected = ["Apple", "Strawberry"]
data[:len(products_selected)]

[out]:
    PRODUCT
0   Apple
1   Banana

There is also a reference in your example to countries_selected. I made the assumption that it was a typo and should have also been products_selected.

  • Related