Home > Software design >  Conditional filtering inside of for loop, gives empty Series/dataframe?
Conditional filtering inside of for loop, gives empty Series/dataframe?

Time:09-13

I have a 100x15 DF, comparison_df.

enter image description here

I have a list of tuples, (shortened version):

watchlist = [('Admiral Group', 820.0), ('3i', 4300.0), ('AstraZeneca', 7500.0), 
             ('Ashtead Group', 920.0)] 

I want to iterate through this list of tuples and find entries in my DF which meet some conditional filters:

for a, (company, target_price) in enumerate(watchlist):
    companyFilter = comparison_df["Company"][((comparison_df["Mid-price (p)"] <= target_price) 
                          | (comparison_df["Buy Ratio"] >= 0.5))
                          & (comparison_df["Company"] == "{}".format(company))]
    print(companyFilter)

This works, but if I give the conditional statements to comparison_df["Company"], so that I can just get the company name returned, I end up with empty Series:

95    TUI
Name: Company, dtype: object
Series([], Name: Company, dtype: object)
6    AstraZeneca
Name: Company, dtype: object
90    Standard Chartered
Name: Company, dtype: object
Series([], Name: Company, dtype: object)

How can I get just the Company names out of this data? Appending .values[0] doesn't work, because there are iterations of the loop that return junk, apparently, and this throws up an error:

IndexError: index 0 is out of bounds for axis 0 with size 0

CodePudding user response:

In general, using .loc or .iloc followed by the column will get you the single scalar value. e.g. comparison_df.iloc[95]["Company"] Of course this requires that your expression return one and only one index value. When your conditional expression returns a boolean mask (set of True/False values) then you'll get a series back.

Of course you could do a .values.tolist()[0] and get the first, but...

Let's step back for a second - what are we trying to do? Get the companies with a mid-price less than an enumerated target, or with a Buy Ratio > .5 - sounds like a job for Pandas... Like, the whole job is a job for Pandas.

  1. join your watch list to your dataframe, keying on the company name.
  2. apply your criteria to produce a list of companies that satisfy it.

So your query changes to something like this:

comparison_df[((comparison_df["Mid-price (p)"] <= comparison_df["target_price"]) 
                          | (comparison_df["Buy Ratio"] >= 0.5))]["Company"].values.tolist()

No looping required.

If you're not sure how to do the merge, you just create a dataframe from your watchlist:

watchlist = [('Admiral Group', 820.0), ('3i', 4300.0), ('AstraZeneca', 7500.0), 
             ('Ashtead Group', 920.0)] 
joinDF = pd.DataFrame(watchlist, columns=["Company", "target_price"])

then left-outer join it to your comparison_df on "Company"

  • Related