I am using Jupyter and am trying to find the common data and have it tell me how many times the Bull has been in the top 3 or bottom three
LowWeaningData = {}
LowWeightData = {}
LowP8Data = {}
LowRibData = {}
LowEmaData = {}
LowImfData = {}
LowGrowthData = {}
LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)'])
LowWeightData = df_bulls.nsmallest(3,['Weight (kg)'])
LowP8Data = df_bulls.nsmallest(3,['P8'])
LowRibData = df_bulls.nsmallest(3,['RIB'])
LowEmaData = df_bulls.nsmallest(3,['EMA(cm)'])
LowImfData = df_bulls.nsmallest(3,['IMF'])
LowGrowthData = df_bulls.nsmallest(3,['Growth %'])
Which prints this
This list is of the Lowest Growth % Data
Bull Sire Dam Weaning Weight(kg) Weight (kg) P8 RIB EMA(cm) \
5 S10 Black Magic L16 522.0 818 7 6 124
1 S24 P42 L11 469.0 774 7 6 116
2 S32 P41 M6 401.0 662 6 5 105
IMF Growth %
5 6.3 56.704981
1 5.6 65.031983
2 4.3 65.087282
The next part is trying to find how many times the bull appeared in the list and which ones so i use this code
lower_elements_in_all =
list(set.intersection(*map(set[LowWeaningData, LowWeightData,
LowP8Data, LowRibData, LowEmaData, LowImfData, LowGrowthData])))
I keep getting this instead of the actual bull names
['Bull', 'Sire', 'P8', 'EMA(cm)', 'RIB', 'Growth %', 'Dam', 'Weaning Weight(kg)', 'IMF', 'Weight (kg)']
How i would like my data to return is as follows
S10 has appeared 3 times in category Low Weaning Data, Low P8 Data and Low Rib Data
S24 has appeared 3 times in etc
S32 has appeared 1 times in etc
in descending order so its easy to see
CodePudding user response:
I would start by adding a new column called "Category"
to each of your dataframes with a string for the appropriate category. For example:
LowWeaningData = df_bulls.nsmallest(3,['Weaning Weight(kg)']).assign(category="Low Weaning Data")
LowWeightData = df_bulls.nsmallest(3,['Weight (kg)']).assign(category="Low Weight Data")
LowP8Data = df_bulls.nsmallest(3,['P8']).assign(category="Low P8 Data")
LowRibData = df_bulls.nsmallest(3,['RIB']).assign(category="Low Rib Data")
LowEmaData = df_bulls.nsmallest(3,['EMA(cm)']).assign(category="Low EMA Data")
LowImfData = df_bulls.nsmallest(3,['IMF']).assign(category="Low IMF Data")
LowGrowthData = df_bulls.nsmallest(3,['Growth %']).assign(category="Low Growth Data")
Then use pd.concat
to combine all of these dataframes together:
low_bulls_by_category_df = pd.concat([
LowWeaningData,
LowWeightData,
LowP8Data,
LowRibData,
LowEmaData,
LowImfData,
LowGrowthData
])[['Bull','Category']]
Using some data I made up, low_bulls_by_category_df
should look something like the following (with some of the categories removed to simplify the example):
Bull Category
0 S10 Low Weaning Data
1 S11 Low Weaning Data
2 S12 Low Weaning Data
0 S10 Low Weight Data
1 S13 Low Weight Data
2 S14 Low Weight Data
0 S12 Low P8 Data
1 S11 Low P8 Data
2 S10 Low P8 Data
The reason we added a "Category"
column to each of your smallest value DataFrames is so that when you combine all of these dataframes together, you still know which low category each bull came from.
Then we can loop through the portions of this dataframe for each unique bull using a groupby, and populate a dictionary with information about how many times each bull appears and what categories they appear in.
low_bull_info = {}
for bull,df_group in low_bulls_by_category_df.groupby("Bull"):
low_bull_info[bull] = {'count':len(df_group), 'categories':df_group['Category'].tolist()}
Inside the dictionary we have all the information we need. And I believe pandas automatically groups by the number of occurrences, so the bulls appearing in largest number of categories will come first in your dictionary – which will be convenient when we want to print in descending order of occurrences later.
{
'S10':
{'count': 3,
'categories': ['Low Weaning Data', 'Low Weight Data', 'Low P8 Data']},
'S11':
{'count': 2,
'categories': ['Low Weaning Data', 'Low P8 Data']},
'S12':
{'count': 2,
'categories': ['Low Weaning Data', 'Low P8 Data']},
'S13':
{'count': 1,
'categories': ['Low Weight Data']},
'S14': {'count': 1, 'categories': ['Low Weight Data']}
}
Then we can loop through this dictionary, and print out the information in a formatted string for each iteration of the loop:
for bull,bull_info in low_bull_info.items():
count = bull_info['count']
categories_str = ', '.join(bull_info['categories'])
print(f"{bull} has appeared {count} times in category {categories_str}")
This results in the following output:
S10 has appeared 3 times in category Low Weaning Data, Low Weight Data, Low P8 Data
S11 has appeared 2 times in category Low Weaning Data, Low P8 Data
S12 has appeared 2 times in category Low Weaning Data, Low P8 Data
S13 has appeared 1 times in category Low Weight Data
S14 has appeared 1 times in category Low Weight Data