count occurrence of a value in multiple columns of a dataframe Pandas-CodePudding

I want to count occurrence of a string value in multiple columns in the entire dataframe, for example :

df = pd.DataFrame({'type1':['A11','B11','A11','A'],
                     'type2':['A12','C','A12','D11'],
                     'type3':['C','D11','B','D12'],
                     'type4':[np.nan,'E12','C','E']
                    })
df

       type1    type2   type3   type4 
    0   A11      A12      C      NaN     
    1   B11       C      D11     E12     
    2   A11      A12      B       C      
    3    A       D11     D12      E

I want to be able to count each distinct value in those 4 columns, like :

value count
 A11    2
 A12    2
 C      3
 B11    1
 D11    2
 D12    1
 E12    3
 B      1
 A      1
 E      1

CodePudding user response：

df.stack().value_counts()

C      3
A11    2
A12    2
D11    2
B11    1
E12    1
B      1
A      1
D12    1
E      1

if you need the names:

df.stack().value_counts().reset_index(name='count').rename({'index':'value'}, axis = 1)
 
  value  count
0     C      3
1   A11      2
2   A12      2
3   D11      2
4   B11      1
5   E12      1
6     B      1
7     A      1
8   D12      1
9     E      1

CodePudding user response：

You can flatten your values using numpy.reshape, then create a new dataframe and use pd.DataFrame.value_counts to get your output:

pd.DataFrame(df.values.reshape(1,-1)[0], columns = ['type']).value_counts()

Output:

CodePudding user response：

You could do something like this:

df1 = df.apply(pd.Series.value_counts).dropna(how='all').sum(1).reset_index()
df1.rename(columns = {'index':'value', 0:'count'}, inplace = True)
df1

Output:

index	value	count
0	A	1.0
1	A11	2.0
2	A12	2.0
3	B	1.0
4	B11	1.0
5	C	3.0
6	D11	2.0
7	D12	1.0
8	E	1.0
9	E12	1.0

CodePudding user response：

Another way: The advantage of this method is, if one wants to keep understand the data based on type1, type2, etc one just need to do reset_index() in step 2.

Original Data

df = pd.DataFrame({'type1':['A11','B11','A11','A'],
                     'type2':['A12','C','A12','D11'],
                     'type3':['C','D11','B','D12'],
                     'type4':[np.nan,'E12','C','E']
                    })

Stack the data:

df1 = pd.DataFrame(df.stack(), columns = ['colnam'])

Then groupby based on required column:

print(df1.groupby('colnam')['colnam'].count())

index	value	count
0	A	1.0
1	A11	2.0
2	A12	2.0
3	B	1.0
4	B11	1.0
5	C	3.0
6	D11	2.0
7	D12	1.0
8	E	1.0
9	E12	1.0

index	value	count
0	A	1.0
1	A11	2.0
2	A12	2.0
3	B	1.0
4	B11	1.0
5	C	3.0
6	D11	2.0
7	D12	1.0
8	E	1.0
9	E12	1.0

index	value	count
0	A	1.0
1	A11	2.0
2	A12	2.0
3	B	1.0
4	B11	1.0
5	C	3.0
6	D11	2.0
7	D12	1.0
8	E	1.0
9	E12	1.0