AttributeError: 'tuple' object has no attribute 'loc' when filtering on pandas d-CodePudding

Given the following DataFrame -

json_path	Reporting Group	Entity/Grouping	Entity ID	Adjusted Value (Today, No Div, USD)	Adjusted TWR (Current Quarter, No Div, USD)	Adjusted TWR (YTD, No Div, USD)	Annualized Adjusted TWR (Since Inception, No Div, USD)	Adjusted Value (No Div, USD)
data.attributes.total.children.[0].children.[0].children.[0]	Barrack Family	William and Rupert Trust	9957007	-1.44				-1.44
data.attributes.total.children.[0].children.[0].children.[0].children.[0]	Barrack Family	Cash	-	-1.44				-1.44
data.attributes.total.children.[0].children.[0].children.[1]	Barrack Family	Gratia Holdings No. 2 LLC	8413655	55491732.66	-0.971018847	-0.971018847	11.52490309	55491732.66
data.attributes.total.children.[0].children.[0].children.[1].children.[0]	Barrack Family	Investment Grade Fixed Income	-	18469768.6				18469768.6
data.attributes.total.children.[0].children.[0].children.[1].children.[1]	Barrack Family	High Yield Fixed Income	-	3668982.44	-0.205356545	-0.205356545	4.441190127	3668982.44

The following code should filter out rows where rows != 'Cash' (Entity/Grouping column) and that have a blank value in either Adjusted TWR (Current Quarter, No Div, USD) column, Adjusted TWR (YTD, No Div, USD) column or Annualized Adjusted TWR (Since Inception, No Div, USD) column.

Code: The following code expects to achieve this -

def twr_exceptions_logic():
    perf_asset_class_df = databases_creation()

    m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
    m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
                              'Adjusted TWR (YTD, No Div, USD)',
                              'Annualized Adjusted TWR (Since Inception, No Div, USD)']].eq('').any(1)
    perf_asset_class_df.loc[m1&m2]
    
    return perf_asset_class_df

Error: being still relatively new to Python, I am unsure why this AttributeError is throwing back -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in <module>
     48     writer.save()
     49 
---> 50 xlsx_writer()

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in xlsx_writer()
      1 # Function that writes Exceptions Report and API Response as a consolidated .xlsx file.
      2 def xlsx_writer():
----> 3     reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df = twr_exceptions_logic()
      4 
      5 #   Creating and defining filename for exceptions report

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2834095962.py in twr_exceptions_logic()
      2     perf_asset_class_df = databases_creation()
      3 
----> 4     m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
      5     m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
      6                               'Adjusted TWR (YTD, No Div, USD)',

AttributeError: 'tuple' object has no attribute 'loc'

Help: I have done some research on this AttributionError and am finding conflicting information, as I believe it relates to my particular issue. It looks as if perf_asset_class_df is being returned as a tuple from the database_creation() function. However, it is definitely a pandas dataframe and the only thing database_creation() does is to take a dataframe named df and apply .loc in order to create a pandas dataframe called perf_asset_class_df or am I missing something

perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d \]){4}')]

databases_creation() function -

def databases_creation():
    df = data_cleansing()

    unknown_df = df[df['Entity/Grouping'].str.contains('Unknown')==True]

    perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d \]){4}')]
    perf_asset_class_df = pd.DataFrame(perf_asset_class_df)
    
    perf_entity_df = df[df['json_path'].str.count(r'\.children').eq(3)]
    perf_entity_group_df = df[df['json_path'].str.count(r'\.children').eq(2)]

    return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df

Does anyone have any suggestions?

CodePudding user response：

return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df

This line returns a tuple of data frames. You'll need to unpack it when you call the function to get the data frame you're interested in. When your code calls databases_creation() it saves this entire tuple as perf_asset_class_df. If you only want that data frame you'll need to unpack it:

_, _, perf_asset_class_df, _, _ = databases_creation()

This unpacks the tuple, saving each element to the respective variable. We use _ for the parts we don't care about by convention but it could be any other variable.