Suppose you have the following list called currency_trading_pairs
with the following elements:
currency_trading_pairs = ['USD/CAD', 'EUR/USD', 'GBP/USD', 'NZD/USD', 'AUD/USD',
'USD/JPY', 'EUR/CAD', 'EUR/AUD', 'EUR/JPY', 'EUR/GBP',
'AUD/CAD', 'GBP/JPY', 'CHF/JPY', 'AUD/JPY', 'AUD/NZD']
Now, suppose that you have the following dataframe called filtered_df
with the following data:
Time Currency Volatility expected Event
24 04:30 GBP Low Volatility Expected Inflation Expectations
25 05:00 EUR High Volatility Expected EU Leaders Summit
26 05:10 EUR Low Volatility Expected Italian 15-Year BTP Auction
27 05:10 EUR Low Volatility Expected Italian 3-Year BTP Auction
28 05:10 EUR Low Volatility Expected Italian 7-Year BTP Auction
29 06:00 EUR Low Volatility Expected Spanish Consumer Confidence
30 06:30 INR Low Volatility Expected Bank Loan Growth
31 06:30 INR Low Volatility Expected Deposit Growth
32 06:30 INR Low Volatility Expected FX Reserves, USD
33 07:00 INR Low Volatility Expected Cumulative Industrial Production (Jan)
34 07:00 INR Low Volatility Expected Industrial Production (YoY) (Jan)
35 07:00 INR Low Volatility Expected Manufacturing Output (MoM) (Jan)
36 07:00 BRL Moderate Volatility Expected CPI (YoY) (Feb)
37 07:00 BRL Moderate Volatility Expected CPI (MoM) (Feb)
38 08:06 BRL Moderate Volatility Expected Brazilian IPCA Inflation Index SA (MoM)(Feb)
39 08:30 CAD Low Volatility Expected Capacity Utilization Rate (Q4)
40 08:30 CAD High Volatility Expected Employment Change (Feb)
41 08:30 CAD Low Volatility Expected Full Employment Change (Feb)
42 08:30 CAD Low Volatility Expected Part Time Employment Change (Feb)
43 08:30 CAD Low Volatility Expected Participation Rate (Feb)
44 08:30 CAD Moderate Volatility Expected Unemployment Rate (Feb)
How could you find which currency pairs (elements) from currency_trading_pairs
(list) have both currencies missing in all of the cells in the column Currency
from the filtered_df
so that you can get the following output in a variable called the_missing_pairs
:
the_missing_pairs = ['NZD/USD', 'AUD/USD', 'USD/JPY', 'CHF/JPY', 'AUD/JPY', 'AUD/NZD']
Further explanation: Basically, assuring that all of the currency names in the_missing_pairs
list do not appear in any of the cells in the column Currency
from the filtered_df
.
CodePudding user response:
I would actually convert currency_trading_pairs
to a Series object. Then you can split by /
and explode
, then use isin
, and finally groupby(level=0)
any
to produce the perfect mask:
ctp = pd.Series(currency_trading_pairs)
the_missing_pairs = ctp[~ctp.str.split('/').explode().isin(df['Currency']).groupby(level=0).any()].tolist()
Output:
>>> the_missing_pairs
['NZD/USD', 'AUD/USD', 'USD/JPY', 'CHF/JPY', 'AUD/JPY', 'AUD/NZD']