Home > database >  Sum values of each columns with some condition in dataframe to create another dataframe
Sum values of each columns with some condition in dataframe to create another dataframe

Time:04-25

I have one dataframe with

data = {'Clothes':['Shirt-cotton', 'Shirt-silk', 'Trouser', 'Trouser', 'skirt-short', 'skirt-long'],
        'Color':['Black', 'White', 'Blue', 'Grey', 'White', 'Black'],
        'January':[68768,7898,97987,98799, 878, 536],
        'February':[7099,7970,78680,9697, 567,234],
        'March':[9080,8979,8689,9890, 980,765]}

df = pd.DataFrame(data)

df = Clothes      Color   January  February  March
    Shirt-cotton  Black    68768    7099      9080
    Shirt-silk    White    7898     7970      8979
     Trouser      Blue     97987    78680     8689
     Trouser      Grey     98799    9697      9890
    skirt-short   White    878      567       980
    skirt-long    Black    536      234       765

I need output as mentioned below

df1 = Clothes  January  February  March
       Shirt    76666    15069    18059
      Trouser   196786   88377    18579
       skirt    1414      801      1745

df2 = Color  January  February   March
      Black   69304    7333       9845
      White   8776     8204       9959

I am new to pandas, kindly help me to get this output

CodePudding user response:

df1 aggregates the monthly values by Clothes types. So you can use groupby sum:

df1 = df.groupby(df['Clothes'].str.split('-').str[0].str.capitalize()).sum().reset_index()

Output:

   Clothes  January  February  March
0    Shirt    76666     15069  18059
1    Skirt     1414       801   1745
2  Trouser   196786     88377  18579

df2 is filtering df for Shirt entries. One way is to use boolean indexing:

df2 = df[df['Clothes'].str.contains('shir', case=False)]

Output:

  Clothes  Color  January  February  March
0   Shirt  Black    68768      7099   9080
1   Shirt  White     7898      7970   8979
  • Related