Haven't found a post that does this specific thing.
I have a table like this:
col1 col2 col3 col4
1 a as 12
1 f as 13
2 d af 13
3 f as 12
4 f at 14
4 d at 14
5 h am 15
a normal df.col1.value_counts()
and do this for each column and then concat would work - but it's ugly. Trying to build a function to do take all the value_counts()
of a list of columns and merge it all into one df as the output.
So far i have something like this - which gets all the value counts of the columns, but it puts each into a dataframe with the same name.
def get_value_counts(df,col):
res = pd.DataFrame()
res = df[col].value_counts().reset_index()
res.columns = [col,'value_counts']
return res
aggs_df = df1_test.iloc[:,4:41]
aggs_df.drop(['col1','col2'],inplace=True,axis=1)
num_cols = aggs_df.shape[1]
for i in aggs_df.columns:
a = get_value_counts(aggs_df,i)
a['column'] = i
a.columns = ['values','value_counts','column']
a.columns = a.iloc[0]
a = a[1:]
print(a)
# for frame in a:
# df = pd.DataFrame()
# df = a.append(a, ignore_index=True)
Rather than manually write each df out like:
col1 = df['col1'].value_counts().reset_index()
col1.columns = ['value','vaule_count']
col1['column'] = 'col1'
col2 = df['col2'].value_counts().reset_index()
col2.columns = ['value','vaule_count']
col2['column'] = 'col2'
col3 = df['col3'].value_counts().reset_index()
col3.columns = ['value','vaule_count']
col3['column'] = 'col3'
col4 = df['col4'].value_counts().reset_index()
col4.columns = ['value','vaule_count']
col4['column'] = 'col4'
df1 = pd.DataFrame()
df1 = pd.concat([col1,col2,col3,col4],ignore_index=True)
df1
The output should look like this:
value vaule_count column
0 1 2 col1
1 4 2 col1
2 2 1 col1
3 3 1 col1
4 5 1 col1
5 f 3 col2
6 d 2 col2
7 a 1 col2
8 h 1 col2
9 as 3 col3
10 at 2 col3
11 af 1 col3
12 am 1 col3
13 12 2 col4
14 13 2 col4
15 14 2 col4
16 15 1 col4
CodePudding user response:
Try stacking/melting the data and count with groupby:
df.melt().groupby(['variable','value']).size().reset_index(name='count')
Output:
variable value count
0 col1 1 2
1 col1 2 1
2 col1 3 1
3 col1 4 2
4 col1 5 1
5 col2 a 1
6 col2 d 2
7 col2 f 3
8 col2 h 1
9 col3 af 1
10 col3 am 1
11 col3 as 3
12 col3 at 2
13 col4 12 2
14 col4 13 2
15 col4 14 2
16 col4 15 1
Note: melting the data requires copying the data with probably a lot of extra (repeated column names), which can be expensive. You can just count each column, create a data frame, then concat the counts:
def count_values(s):
return s.value_counts().reset_index(name='count').assign(col=s.name)
pd.concat([count_values(df[c]) for c in df])
Output:
index count col
0 4 2 col1
1 1 2 col1
2 5 1 col1
3 3 1 col1
4 2 1 col1
0 f 3 col2
1 d 2 col2
2 a 1 col2
3 h 1 col2
0 as 3 col3
1 at 2 col3
2 af 1 col3
3 am 1 col3
0 14 2 col4
1 13 2 col4
2 12 2 col4
3 15 1 col4