I have dataframe consisting of more than 200 columns. I want to value_counts() in each column. Below is my code which is working fine but when I want to create "csv". The below code only enter the last column (value count). I want all.
import pandas as pd
df = pd.read_csv("hcp.csv")
for col in df:
df2 = df[col].value_counts()
print(df2)
df2.to_csv("new_hcp.csv")
The print(df2) is showing all value counts but not "CSV". Anyone who can help, I will be grateful.
CodePudding user response:
You can use an apply
on the value_counts
method to get all the values count by column :
import pandas as pd
df = pd.read_csv("hcp.csv")
df2 = df.apply(pd.Series.value_counts).unstack().to_frame().dropna().reset_index().rename(columns={'level_0': 'col_name', 'level_1': 'value_name', 0: 'count'})
df2.to_csv("new_hcp.csv", index=False)
CodePudding user response:
You are overwriting the value of df2
in each iteration.
Create an empty list outside the loop, append the value of value_counts, then create a DF from that list and output it.
import pandas as pd
df = pd.read_csv("hcp.csv")
value_counts_list = []
for col in df:
value_counts_list.append(df[col].value_counts())
print(df2)
pd.DataFrame(value_counts_list).to_csv("new_hcp.csv")