I have a csv
like this:
url, sub_url, label
first.com, first.com/1, label1
first.com, first.com/2, label1
first.com, first.com/3, label2
first.com, first.com/4, label2
first.com, first.com/5, label1
second.com, second.com/1, label2
second.com, second.com/2, label2
second.com, second.com/3, label1
third.com, third.com/1, label2
third.com, third.com/2, label2
I don't care about the sub_url
, I would like to have as output something like this:
url, count_label1, count_label2
first.com, 3, 2
second.com, 1, 2
third.com, 0, 1
I tryied this:
df = pd.read_csv(path_to_csv, sep=,)
df.groupby( [ "url"] )["label"].value_counts()
But I obtain a single row for each disctinct value of label, but I wish to have it as a column. How can I do it?
CodePudding user response:
You can just do crosstab
out = pd.crosstab(df['url'], df['label']).add_prefix('count_').reset_index()