Home > Software design >  Pandas value_counts with added column
Pandas value_counts with added column

Time:01-25

I got data frame as follow:

sentence   userid  topic
hello        1001  smalltalk
hi           1002  smalltalk
hello        1002  smalltalk
how are you? 1003  question
hello        1004  smalltalk
what is new? 1005  question
hi there     1006  smalltalk
hello        1007  smalltalk

With the help of Pandas value_counts

Input:

df['sentence'].value_counts()

Output:

hello 4
hi 2
how are you? 1
what is new? 1

What I really wish to get is the same value count with the a specific column added next to it:

hello 4 smalltalk
hi 2 smalltalk
how are you? 1 question
what is new? 1 question

CodePudding user response:

df[['sentence', 'topic']].value_counts()

CodePudding user response:

IIUC, OP needs to keep interested results of df[['sentence', 'topic']].value_counts() in the form of DataFrame using module for further manipulation\visulaizataions. So this can be achieved by groupby() and aggregating the counts of interested multiple variables\columns under a new column count within DataFrame:

import pandas as pd

#Generate dataframe
df = pd.DataFrame({'userid':    [1001, 1002, 1002, 1003, 1004, 1005, 1006, 1007],    
                    'sentence': ['hello', "hi", 'hello', "how are you?", 'hello', "what is new?", "hi", 'hello'],    
                    'topic':    ["smalltalk", "smalltalk", "smalltalk", "question", "smalltalk", "question", "smalltalk", "smalltalk"],
                    })

#Aggregate counts with respect to interested columns in df
df2 = df.groupby(["sentence","topic"])["topic"].agg(["count"]) \
        .reset_index() \
        .drop_duplicates() #remove duplicates

print(df2) 
#       sentence      topic  count
#0         hello  smalltalk      4
#1            hi  smalltalk      2
#2  how are you?   question      1
#3  what is new?   question      1
  • Related