I have one dataframe with
data = {'Clothes':['Shirt-cotton', 'Shirt-silk', 'Trouser', 'Trouser', 'skirt-short', 'skirt-long'],
'Color':['Black', 'White', 'Blue', 'Grey', 'White', 'Black'],
'January':[68768,7898,97987,98799, 878, 536],
'February':[7099,7970,78680,9697, 567,234],
'March':[9080,8979,8689,9890, 980,765]}
df = pd.DataFrame(data)
df = Clothes Color January February March
Shirt-cotton Black 68768 7099 9080
Shirt-silk White 7898 7970 8979
Trouser Blue 97987 78680 8689
Trouser Grey 98799 9697 9890
skirt-short White 878 567 980
skirt-long Black 536 234 765
I need output as mentioned below
df1 = Clothes January February March
Shirt 76666 15069 18059
Trouser 196786 88377 18579
skirt 1414 801 1745
df2 = Color January February March
Black 69304 7333 9845
White 8776 8204 9959
I am new to pandas, kindly help me to get this output
CodePudding user response:
df1
aggregates the monthly values by Clothes
types. So you can use groupby
sum
:
df1 = df.groupby(df['Clothes'].str.split('-').str[0].str.capitalize()).sum().reset_index()
Output:
Clothes January February March
0 Shirt 76666 15069 18059
1 Skirt 1414 801 1745
2 Trouser 196786 88377 18579
df2
is filtering df
for Shirt entries. One way is to use boolean indexing:
df2 = df[df['Clothes'].str.contains('shir', case=False)]
Output:
Clothes Color January February March
0 Shirt Black 68768 7099 9080
1 Shirt White 7898 7970 8979