Finding the common data from a table-CodePudding

I am using Jupyter and am trying to find the common data and have it tell me how many times the Bull has been in the top 3 or bottom three

LowWeaningData = {}
LowWeightData = {}
LowP8Data = {}
LowRibData = {}
LowEmaData = {}
LowImfData = {}
LowGrowthData = {}

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)'])

LowWeightData = df_bulls.nsmallest(3,['Weight (kg)'])

LowP8Data = df_bulls.nsmallest(3,['P8'])

LowRibData = df_bulls.nsmallest(3,['RIB'])

LowEmaData = df_bulls.nsmallest(3,['EMA(cm)'])

LowImfData = df_bulls.nsmallest(3,['IMF'])

LowGrowthData = df_bulls.nsmallest(3,['Growth %'])

Which prints this

This list is of the Lowest Growth % Data 
   Bull         Sire  Dam  Weaning Weight(kg)  Weight (kg)  P8  RIB  EMA(cm)  \
5  S10  Black Magic  L16               522.0          818   7    6      124   
1  S24          P42  L11               469.0          774   7    6      116   
2  S32          P41   M6               401.0          662   6    5      105   

   IMF   Growth %  
5  6.3  56.704981  
1  5.6  65.031983  
2  4.3  65.087282

The next part is trying to find how many times the bull appeared in the list and which ones so i use this code

lower_elements_in_all = 
list(set.intersection(*map(set[LowWeaningData, LowWeightData, 
LowP8Data, LowRibData, LowEmaData, LowImfData, LowGrowthData])))

I keep getting this instead of the actual bull names

['Bull', 'Sire', 'P8', 'EMA(cm)', 'RIB', 'Growth %', 'Dam', 'Weaning Weight(kg)', 'IMF', 'Weight (kg)']

How i would like my data to return is as follows

S10 has appeared 3 times in category Low Weaning Data, Low P8 Data and Low Rib Data
S24 has appeared 3 times in etc
S32 has appeared 1 times in etc

in descending order so its easy to see

CodePudding user response：

I would start by adding a new column called "Category" to each of your dataframes with a string for the appropriate category. For example:

LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)']).assign(category="Low Weaning Data")
LowWeightData = df_bulls.nsmallest(3,['Weight (kg)']).assign(category="Low Weight Data")
LowP8Data = df_bulls.nsmallest(3,['P8']).assign(category="Low P8 Data")
LowRibData = df_bulls.nsmallest(3,['RIB']).assign(category="Low Rib Data")
LowEmaData = df_bulls.nsmallest(3,['EMA(cm)']).assign(category="Low EMA Data")
LowImfData = df_bulls.nsmallest(3,['IMF']).assign(category="Low IMF Data")
LowGrowthData = df_bulls.nsmallest(3,['Growth %']).assign(category="Low Growth Data")

Then use pd.concat to combine all of these dataframes together:

low_bulls_by_category_df = pd.concat([
    LowWeaningData,
    LowWeightData,
    LowP8Data,
    LowRibData,
    LowEmaData,
    LowImfData,
    LowGrowthData
])[['Bull','Category']]

Using some data I made up, low_bulls_by_category_df should look something like the following (with some of the categories removed to simplify the example):

  Bull          Category
0  S10  Low Weaning Data
1  S11  Low Weaning Data
2  S12  Low Weaning Data
0  S10   Low Weight Data
1  S13   Low Weight Data
2  S14   Low Weight Data
0  S12       Low P8 Data
1  S11       Low P8 Data
2  S10       Low P8 Data

The reason we added a "Category" column to each of your smallest value DataFrames is so that when you combine all of these dataframes together, you still know which low category each bull came from.

Then we can loop through the portions of this dataframe for each unique bull using a groupby, and populate a dictionary with information about how many times each bull appears and what categories they appear in.

low_bull_info = {}
for bull,df_group in low_bulls_by_category_df.groupby("Bull"):
    low_bull_info[bull] = {'count':len(df_group), 'categories':df_group['Category'].tolist()}

Inside the dictionary we have all the information we need. And I believe pandas automatically groups by the number of occurrences, so the bulls appearing in largest number of categories will come first in your dictionary – which will be convenient when we want to print in descending order of occurrences later.

{
    'S10': 
        {'count': 3, 
         'categories': ['Low Weaning Data', 'Low Weight Data', 'Low P8 Data']}, 
     'S11': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S12': 
        {'count': 2, 
         'categories': ['Low Weaning Data', 'Low P8 Data']}, 
     'S13': 
        {'count': 1, 
         'categories': ['Low Weight Data']}, 
     'S14': {'count': 1, 'categories': ['Low Weight Data']}
}

Then we can loop through this dictionary, and print out the information in a formatted string for each iteration of the loop:

for bull,bull_info in low_bull_info.items():
    count = bull_info['count']
    categories_str = ', '.join(bull_info['categories'])
    print(f"{bull} has appeared {count} times in category {categories_str}")

This results in the following output:

S10 has appeared 3 times in category Low Weaning Data, Low Weight Data, Low P8 Data
S11 has appeared 2 times in category Low Weaning Data, Low P8 Data
S12 has appeared 2 times in category Low Weaning Data, Low P8 Data
S13 has appeared 1 times in category Low Weight Data
S14 has appeared 1 times in category Low Weight Data