I want to count occurrence of a string value in multiple columns in the entire dataframe, for example :
df = pd.DataFrame({'type1':['A11','B11','A11','A'],
'type2':['A12','C','A12','D11'],
'type3':['C','D11','B','D12'],
'type4':[np.nan,'E12','C','E']
})
df
type1 type2 type3 type4
0 A11 A12 C NaN
1 B11 C D11 E12
2 A11 A12 B C
3 A D11 D12 E
I want to be able to count each distinct value in those 4 columns, like :
value count
A11 2
A12 2
C 3
B11 1
D11 2
D12 1
E12 3
B 1
A 1
E 1
CodePudding user response:
df.stack().value_counts()
C 3
A11 2
A12 2
D11 2
B11 1
E12 1
B 1
A 1
D12 1
E 1
if you need the names:
df.stack().value_counts().reset_index(name='count').rename({'index':'value'}, axis = 1)
value count
0 C 3
1 A11 2
2 A12 2
3 D11 2
4 B11 1
5 E12 1
6 B 1
7 A 1
8 D12 1
9 E 1
CodePudding user response:
You can flatten your values using numpy.reshape
, then create a new dataframe and use pd.DataFrame.value_counts
to get your output:
pd.DataFrame(df.values.reshape(1,-1)[0], columns = ['type']).value_counts()
Output:
type
C 3
A11 2
A12 2
D11 2
A 1
B 1
B11 1
D12 1
E 1
E12 1
CodePudding user response:
You could do something like this:
df1 = df.apply(pd.Series.value_counts).dropna(how='all').sum(1).reset_index()
df1.rename(columns = {'index':'value', 0:'count'}, inplace = True)
df1
Output:
index | value | count |
---|---|---|
0 | A | 1.0 |
1 | A11 | 2.0 |
2 | A12 | 2.0 |
3 | B | 1.0 |
4 | B11 | 1.0 |
5 | C | 3.0 |
6 | D11 | 2.0 |
7 | D12 | 1.0 |
8 | E | 1.0 |
9 | E12 | 1.0 |
CodePudding user response:
Another way: The advantage of this method is, if one wants to keep understand the data based on type1
, type2
, etc one just need to do reset_index()
in step 2.
Original Data
df = pd.DataFrame({'type1':['A11','B11','A11','A'],
'type2':['A12','C','A12','D11'],
'type3':['C','D11','B','D12'],
'type4':[np.nan,'E12','C','E']
})
Stack the data:
df1 = pd.DataFrame(df.stack(), columns = ['colnam'])
Then groupby based on required column:
print(df1.groupby('colnam')['colnam'].count())