Home > database >  Dataframe group by with counts of values of a column
Dataframe group by with counts of values of a column

Time:03-20

I have a csv like this:

url, sub_url, label
first.com, first.com/1, label1
first.com, first.com/2, label1
first.com, first.com/3, label2
first.com, first.com/4, label2
first.com, first.com/5, label1
second.com, second.com/1, label2
second.com, second.com/2, label2
second.com, second.com/3, label1
third.com, third.com/1, label2
third.com, third.com/2, label2

I don't care about the sub_url, I would like to have as output something like this:

url, count_label1, count_label2
first.com, 3, 2
second.com, 1, 2
third.com, 0, 1

I tryied this:

df = pd.read_csv(path_to_csv, sep=,)
df.groupby( [ "url"] )["label"].value_counts()

But I obtain a single row for each disctinct value of label, but I wish to have it as a column. How can I do it?

CodePudding user response:

You can just do crosstab

out = pd.crosstab(df['url'], df['label']).add_prefix('count_').reset_index()
  • Related