How to count columns by row in python pandas?-CodePudding

I am not certain how to describe this situation. Suppose I have the well-defined following table in dataframe pandas,

            0     1      2      3      4      5  ... 2949 2950 2951 2952 2953 2954
0.txt    html  head   meta   meta   meta   meta  ...                              
107.txt  html  head  title   meta   meta   meta  ...                              
125.txt  html  head  title  style   body    div  ...                              
190.txt  html  head   meta  title  style   body  ...                              
202.txt  html  head   meta  title   link  style

And I want to make this table to spread out, columns representing the unique html tag and the value representing the specified row's count..

         html  head   meta  style   link   body  ... 
0.txt       1     1      4      2      1      2  ...                              
107.txt     1     2      3      0      0      1  ...

Somthing like the above.. I have counted the total 88 distinct html headers are in the table so the column count might be 88. If this turn out to be success, then I will apply padnas' describe() , value_counts() function to find out more about this tags' statistics.. However, I am stuck with the above. Please give me some ideas to tackle this. Thank you..

CodePudding user response：

IIUC, you can first stack then use groupby.value_counts to get the stats per initial row, then unstack to get the expected result. With the data provided, for the first 3 rows and 6 columns, you get.

res= (
    df.stack()
      .groupby(level=0).value_counts()
      .unstack(fill_value=0)
)
print(res)
#         body  div  head  html  meta  style  title
# 0.txt       0    0     1     1     4      0      0
# 107.txt     0    0     1     1     3      0      1
# 125.txt     1    1     1     1     0      1      1

CodePudding user response：

To spread out the table in the way you described, you can use the pandas method pivot_table(). This method takes a number of arguments, including the dataframe you want to transform, the columns you want to pivot, and the values to fill in the new columns.

Here is an example of how you might use pivot_table() to spread out your dataframe:

# Import the pandas library
import pandas as pd

# Load your dataframe
df = pd.read_csv("your_data.csv")

# Use the pivot_table() method to spread out the data
pivoted_df = df.pivot_table(index=["row_name"], columns="column_name", values="value_column")

This will create a new dataframe with the rows representing the original row names, the columns representing the unique html tags, and the values representing the count of each tag in each row. After that you can use the already listed describe() and value_counts() methods on this new dataframe to get statistics about the html tags.