My dataframe can be simplified like this:
Dataframe :
df = pd.DataFrame({'Customer_ID': range(1, 9), 'Category': ['Cat', 'Cat', 'Dog', 'Unknown', 'Cat', 'Dog', 'Dog', 'Unknown'], 'Age': [29, 3, 1, 8, 25, 28, 13, 10],
'Weight' : [4, 1, 8, 6, 5, 2, 7, 3]})
{'Customer_ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
'Category': {0: 'Cat',
1: 'Cat',
2: 'Dog',
3: 'Unknown',
4: 'Cat',
5: 'Dog',
6: 'Dog',
7: 'Unknown'},
'Age': {0: 29, 1: 3, 2: 1, 3: 8, 4: 25, 5: 28, 6: 13, 7: 10},
'Weight': {0: 4, 1: 1, 2: 8, 3: 6, 4: 5, 5: 2, 6: 7, 7: 3}}
How can I add a column with some different strings based on some conditions: For example:
- If largest age within category, then add string: 'Top 1 oldest within category', 'Top 2 oldest within category',
- Secondly: If heaviest weight within category, then add string: 'Top 1 heaviest within category', 'Top 2 heaviest within category', 'Top 3 heaviest within category'
Desired output:
{'Customer_ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
'Category': {0: 'Cat',
1: 'Cat',
2: 'Dog',
3: 'Unknown',
4: 'Cat',
5: 'Dog',
6: 'Dog',
7: 'Unknown'},
'Age': {0: 29, 1: 3, 2: 1, 3: 8, 4: 25, 5: 28, 6: 13, 7: 10},
'Weight': {0: 4, 1: 1, 2: 8, 3: 6, 4: 5, 5: 2, 6: 7, 7: 3},
'Result': {0: 'Oldest within category',
1: '-',
2: 'Heaviest within category',
3: 'Second oldest within category',
4: 'Second oldest within category',
5: 'Oldest within category',
6: 'Second oldest within category',
7: 'Oldest within category'}}
My own approach would be something like:
for category in df.category.unique():
resulting_df_category = df.loc[df['category'] == category]
if i in range(1,3):
But I want to know if there is a better way.
CodePudding user response:
You can use groupby
and cumcount
as well as sort_values
to get the desired results.
df['Top_Age'] = 'Top ' (df.sort_values('Age', ascending = False).groupby('Category')['Age'].cumcount() 1).astype(str)
df['Top_Weight'] = 'Top ' (df.sort_values('Weight', ascending = False).groupby('Category')['Weight'].cumcount() 1).astype(str)
CodePudding user response:
I think this is what your want: show the topest characteristic of animals.
df[" oldest"] = df.groupby('Category')["Age"].rank(ascending=False)
df[" heaviest"] = df.groupby('Category')["Weight"].rank(ascending=False)
df["Top"] = df[[" oldest", " heaviest"]].idxmin(axis=1)
df["Rank"] = df[[" oldest", " heaviest"]].min(axis=1).astype("int")
df["Description"] = df.apply(lambda x: "Top " str(x["Rank"]) x["Top"], axis=1)