I have data in a dataframe that looks like this, where each column is a KEYWORD and each row is an observation of how many times each ID said the word:
id | bagel | pizza |
---|---|---|
ABC | 2 | 3 |
DEF | 1 | 3 |
GHI | 7 | 9 |
TOTAL | 10 | 15 |
I am trying to get it to a form where I can see what the most popular word is overall, something where the columns themselves are new columns and the TOTAL row transforms to a column that can be sorted:
Column | Total |
---|---|
bagel | 10 |
pizza | 15 |
I have tried melt
and stack
but I dont think I am using either one correctly. Any help is appreciated.
CodePudding user response:
Select the column then T
out = df[df.id.eq('TOTAL')].set_index('id').T.reset_index()
Out[433]:
id index TOTAL
0 bagel 10
1 pizza 15
CodePudding user response:
You can use df.sum()
data = df.sum(numeric_only=True, axis=0)
The code above will return a series, you need to convert it into a DataFrame using the syntax below and set the column names.
df = pd.DataFrame({'Column':data.index, 'Total':data.values})
print(df)
That gives me,
Column Total
0 bagel 10
1 pizza 15
You can also do the following to set the Column
column as the index removing the (0, 1, etc.) index.
df = df.set_index('Column')
print(df)
Which gives me,
Total
Column
bagel 10
pizza 15