Home > Net >  Create a function to get multiple value counts and concat into one dateframe
Create a function to get multiple value counts and concat into one dateframe

Time:08-09

Haven't found a post that does this specific thing.

I have a table like this:

col1    col2    col3    col4
1       a       as      12
1       f       as      13
2       d       af      13
3       f       as      12
4       f       at      14
4       d       at      14
5       h       am      15

a normal df.col1.value_counts() and do this for each column and then concat would work - but it's ugly. Trying to build a function to do take all the value_counts() of a list of columns and merge it all into one df as the output.

So far i have something like this - which gets all the value counts of the columns, but it puts each into a dataframe with the same name.

def get_value_counts(df,col):
    res = pd.DataFrame()
    res = df[col].value_counts().reset_index()
    res.columns = [col,'value_counts']
    return res

aggs_df = df1_test.iloc[:,4:41]
aggs_df.drop(['col1','col2'],inplace=True,axis=1)
num_cols = aggs_df.shape[1]
for i in aggs_df.columns:
    a = get_value_counts(aggs_df,i)
    a['column'] = i
    a.columns = ['values','value_counts','column']
    a.columns = a.iloc[0] 
    a = a[1:]
    print(a)
#     for frame in a:
#         df = pd.DataFrame()
#         df = a.append(a, ignore_index=True)

Rather than manually write each df out like:

col1 = df['col1'].value_counts().reset_index()
col1.columns = ['value','vaule_count']
col1['column'] = 'col1'
col2 = df['col2'].value_counts().reset_index()
col2.columns = ['value','vaule_count']
col2['column'] = 'col2'
col3 = df['col3'].value_counts().reset_index()
col3.columns = ['value','vaule_count']
col3['column'] = 'col3'
col4 = df['col4'].value_counts().reset_index()
col4.columns = ['value','vaule_count']
col4['column'] = 'col4'

df1 = pd.DataFrame()
df1 = pd.concat([col1,col2,col3,col4],ignore_index=True)
df1

The output should look like this:

value   vaule_count column
0   1   2   col1
1   4   2   col1
2   2   1   col1
3   3   1   col1
4   5   1   col1
5   f   3   col2
6   d   2   col2
7   a   1   col2
8   h   1   col2
9   as  3   col3
10  at  2   col3
11  af  1   col3
12  am  1   col3
13  12  2   col4
14  13  2   col4
15  14  2   col4
16  15  1   col4

CodePudding user response:

Try stacking/melting the data and count with groupby:

df.melt().groupby(['variable','value']).size().reset_index(name='count')

Output:

   variable value  count
0      col1     1      2
1      col1     2      1
2      col1     3      1
3      col1     4      2
4      col1     5      1
5      col2     a      1
6      col2     d      2
7      col2     f      3
8      col2     h      1
9      col3    af      1
10     col3    am      1
11     col3    as      3
12     col3    at      2
13     col4    12      2
14     col4    13      2
15     col4    14      2
16     col4    15      1

Note: melting the data requires copying the data with probably a lot of extra (repeated column names), which can be expensive. You can just count each column, create a data frame, then concat the counts:

def count_values(s):
    return s.value_counts().reset_index(name='count').assign(col=s.name)

pd.concat([count_values(df[c]) for c in df])

Output:

  index  count   col
0     4      2  col1
1     1      2  col1
2     5      1  col1
3     3      1  col1
4     2      1  col1
0     f      3  col2
1     d      2  col2
2     a      1  col2
3     h      1  col2
0    as      3  col3
1    at      2  col3
2    af      1  col3
3    am      1  col3
0    14      2  col4
1    13      2  col4
2    12      2  col4
3    15      1  col4
  • Related