I learned that when I type df = pd.read_csv("file.csv")
, I can bring this data from csv file like below :
round date first second third fourth fifth sixth bonus
0 959 2021.04.17 1 14 15 24 40 41 35
1 958 2021.04.10 2 9 10 16 35 37 1
2 957 2021.04.03 4 15 24 35 36 40 1
3 956 2021.03.27 10 11 20 21 25 41 40
4 955 2021.03.20 4 9 23 26 29 33 8
Also I learned that I can use df["column_name"].value_counts()
to count how frequently the value wrote.
For example, df["first"].value_count()
, then it will show the 'first' column's count value as a result. like
1 134
2 107
3 94 ...
But I want to count the all the value in multiple column at once.
How can I count the value from multiple column at once? Can some tell me which method to use?
CodePudding user response:
If you want a matrix of all present values in all columns from start
to end, use value_counts
as suggested by @jezrael:
>>> df.iloc[:, 2:].apply(pd.value_counts).fillna(0).astype(int)
first second third fourth fifth sixth bonus
1 1 0 0 0 0 0 2
2 1 0 0 0 0 0 0
4 2 0 0 0 0 0 0
8 0 0 0 0 0 0 1
9 0 2 0 0 0 0 0
10 1 0 1 0 0 0 0
11 0 1 0 0 0 0 0
14 0 1 0 0 0 0 0
15 0 1 1 0 0 0 0
16 0 0 0 1 0 0 0
20 0 0 1 0 0 0 0
21 0 0 0 1 0 0 0
23 0 0 1 0 0 0 0
24 0 0 1 1 0 0 0
25 0 0 0 0 1 0 0
26 0 0 0 1 0 0 0
29 0 0 0 0 1 0 0
33 0 0 0 0 0 1 0
35 0 0 0 1 1 0 1
36 0 0 0 0 1 0 0
37 0 0 0 0 0 1 0
40 0 0 0 0 1 1 1
41 0 0 0 0 0 2 0
CodePudding user response:
You can select all columns without first 2 and count values, last replace NaN
s and convert to integers:
df.iloc[:, 2:].apply(pd.value_counts).fillna(0).astype(int)
Alternative with different format (transpose first solution):
df.iloc[:, 2:].apply(pd.value_counts, axis=1).fillna(0).astype(int)