I have a 100x15 DF, comparison_df
.
I have a list of tuples, (shortened version):
watchlist = [('Admiral Group', 820.0), ('3i', 4300.0), ('AstraZeneca', 7500.0),
('Ashtead Group', 920.0)]
I want to iterate through this list of tuples and find entries in my DF which meet some conditional filters:
for a, (company, target_price) in enumerate(watchlist):
companyFilter = comparison_df["Company"][((comparison_df["Mid-price (p)"] <= target_price)
| (comparison_df["Buy Ratio"] >= 0.5))
& (comparison_df["Company"] == "{}".format(company))]
print(companyFilter)
This works, but if I give the conditional statements to comparison_df["Company"]
, so that I can just get the company name returned, I end up with empty Series:
95 TUI
Name: Company, dtype: object
Series([], Name: Company, dtype: object)
6 AstraZeneca
Name: Company, dtype: object
90 Standard Chartered
Name: Company, dtype: object
Series([], Name: Company, dtype: object)
How can I get just the Company names out of this data? Appending .values[0]
doesn't work, because there are iterations of the loop that return junk, apparently, and this throws up an error:
IndexError: index 0 is out of bounds for axis 0 with size 0
CodePudding user response:
In general, using .loc or .iloc followed by the column will get you the single scalar value. e.g. comparison_df.iloc[95]["Company"] Of course this requires that your expression return one and only one index value. When your conditional expression returns a boolean mask (set of True/False values) then you'll get a series back.
Of course you could do a .values.tolist()[0] and get the first, but...
Let's step back for a second - what are we trying to do? Get the companies with a mid-price less than an enumerated target, or with a Buy Ratio > .5 - sounds like a job for Pandas... Like, the whole job is a job for Pandas.
- join your watch list to your dataframe, keying on the company name.
- apply your criteria to produce a list of companies that satisfy it.
So your query changes to something like this:
comparison_df[((comparison_df["Mid-price (p)"] <= comparison_df["target_price"])
| (comparison_df["Buy Ratio"] >= 0.5))]["Company"].values.tolist()
No looping required.
If you're not sure how to do the merge, you just create a dataframe from your watchlist:
watchlist = [('Admiral Group', 820.0), ('3i', 4300.0), ('AstraZeneca', 7500.0),
('Ashtead Group', 920.0)]
joinDF = pd.DataFrame(watchlist, columns=["Company", "target_price"])
then left-outer join it to your comparison_df on "Company"