Home > Back-end >  How to add column with conditions to dataframe
How to add column with conditions to dataframe

Time:08-03

My dataframe can be simplified like this:

Dataframe :

df = pd.DataFrame({'Customer_ID': range(1, 9), 'Category': ['Cat', 'Cat', 'Dog', 'Unknown', 'Cat', 'Dog', 'Dog', 'Unknown'],  'Age': [29, 3, 1, 8, 25, 28, 13, 10],
'Weight' : [4, 1, 8, 6, 5, 2, 7, 3]})
{'Customer_ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
 'Category': {0: 'Cat',
  1: 'Cat',
  2: 'Dog',
  3: 'Unknown',
  4: 'Cat',
  5: 'Dog',
  6: 'Dog',
  7: 'Unknown'},
 'Age': {0: 29, 1: 3, 2: 1, 3: 8, 4: 25, 5: 28, 6: 13, 7: 10},
 'Weight': {0: 4, 1: 1, 2: 8, 3: 6, 4: 5, 5: 2, 6: 7, 7: 3}}

How can I add a column with some different strings based on some conditions: For example:

  • If largest age within category, then add string: 'Top 1 oldest within category', 'Top 2 oldest within category',
  • Secondly: If heaviest weight within category, then add string: 'Top 1 heaviest within category', 'Top 2 heaviest within category', 'Top 3 heaviest within category'

Desired output:

{'Customer_ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
 'Category': {0: 'Cat',
  1: 'Cat',
  2: 'Dog',
  3: 'Unknown',
  4: 'Cat',
  5: 'Dog',
  6: 'Dog',
  7: 'Unknown'},
 'Age': {0: 29, 1: 3, 2: 1, 3: 8, 4: 25, 5: 28, 6: 13, 7: 10},
 'Weight': {0: 4, 1: 1, 2: 8, 3: 6, 4: 5, 5: 2, 6: 7, 7: 3},
 'Result': {0: 'Oldest within category',
  1: '-',
  2: 'Heaviest within category',
  3: 'Second oldest within category',
  4: 'Second oldest within category',
  5: 'Oldest within category',
  6: 'Second oldest within category',
  7: 'Oldest within category'}}

My own approach would be something like:

for category in df.category.unique():
    resulting_df_category = df.loc[df['category'] == category] 
    if i in range(1,3):

But I want to know if there is a better way.

CodePudding user response:

You can use groupby and cumcount as well as sort_values to get the desired results.

df['Top_Age'] = 'Top '   (df.sort_values('Age', ascending = False).groupby('Category')['Age'].cumcount()   1).astype(str)
df['Top_Weight'] = 'Top '   (df.sort_values('Weight', ascending = False).groupby('Category')['Weight'].cumcount()   1).astype(str)

CodePudding user response:

I think this is what your want: show the topest characteristic of animals.

df[" oldest"] = df.groupby('Category')["Age"].rank(ascending=False)
df[" heaviest"] = df.groupby('Category')["Weight"].rank(ascending=False)
df["Top"] = df[[" oldest", " heaviest"]].idxmin(axis=1)
df["Rank"] = df[[" oldest", " heaviest"]].min(axis=1).astype("int")
df["Description"] = df.apply(lambda x: "Top "   str(x["Rank"])   x["Top"], axis=1)
  • Related