I have a dataframe like below:
id | name | colA | colB |
---|---|---|---|
One | Ana | abc | xyz |
One | Ana | abc | xyz |
One | Ana | bde | xyz |
One | Ana | bde | xyz |
One | Ana | bde | yhn |
One | Ana | bde | yhn |
One | Ana | bde | qwe |
One | Ana | teh | qwe |
Two | Bob | abc | qwe |
Two | Bob | teh | qwe |
Two | Bob | pop | omg |
I need to transform my dataframe as
id | name | abc | bde | teh | pop | xyz | yhn | qwe | omg |
---|---|---|---|---|---|---|---|---|---|
One | Ana | 2 | 5 | 1 | 0 | 4 | 2 | 2 | 0 |
Two | Bob | 1 | 0 | 1 | 1 | 0 | 0 | 2 | 1 |
I wrote below code to achieve this but it do not gives me expected output and also I have no idea how to perform it for multiple columns. Please help. df = df.groupby(['id','colA']).size().reset_index(name='colA_counts')
CodePudding user response:
you could try this:
a = df[["id", "name", "A"]].copy()
b = df[["id", "name", "B"]].copy()
b.columns = ["id", "name", "A"]
result = pd.concat([a, b]).value_counts().reset_index()
result.columns = ["id", "name", "A", "count"]
pd.pivot_table(result, index=["id", "name"], columns=["A"], values="count").fillna(0)
What I'm basically doing is, I first create a new DataFrame, concatenated of the original one where both A and B are just called "A" then as you don't make any difference of whether it is A or B. In this I let it count the different values, i.e. combinations of id, name and value.
Then I pivot the table so, that the values in "A" are considered the columns, finally I fill up all those combinations that don't appear in the data with zeros by calling fillna(0)
.
What I btw don't understand is, where the 1 in the "one ana pop" in your example comes from. Your data is not providing this.