I want to write a code where it outputs the number of repeated values in a
for each different value. Then I want to make a pandas data sheet to print it. The sums
code down below does not work how would I be able to make it work and get the Expected Output?
import numpy as np
import pandas as pd
a = np.array([12,12,12,3,43,43,43,22,1,3,3,43])
uniques = np.unique(a)
sums = np.sum(uniques[:-1]==a[:-1])
Expected Output:
Value Repetition Count
1 1
3 3
12 3
22 1
43 4
CodePudding user response:
You can use groupby
:
>>> pd.Series(a).groupby(a).count()
1 1
3 3
12 3
22 1
43 4
dtype: int64
Or value_counts()
:
>>> pd.Series(a).value_counts().sort_index()
1 1
3 3
12 3
22 1
43 4
dtype: int64
CodePudding user response:
Easiest if you make a pandas dataframe from np.array and then use value_counts().
df = pd.DataFrame(data=a, columns=['col1'])
print(df.col1.value_counts())
43 4
12 3
3 3
22 1
1 1
CodePudding user response:
Define a dataframe df
based on the array a
. Then, use .groupby()
.size()
to get the size/count of unique values, as follows:
a = np.array([12,12,12,3,43,43,43,22,1,3,3,43])
df = pd.DataFrame({'Value': a})
df.groupby('Value').size().reset_index(name='Repetition Count')
Result:
Value Repetition Count
0 1 1
1 3 3
2 12 3
3 22 1
4 43 4
Edit
If you want also the percentages of counts, you can use:
(df.groupby('Value', as_index=False)
.agg(**{'Repetition Count': ('Value', 'size'),
'Percent': ('Value', lambda x: round(x.size/len(a) *100, 2))})
)
Result:
Value Repetition Count Percent
0 1 1 8.33
1 3 3 25.00
2 12 3 25.00
3 22 1 8.33
4 43 4 33.33
or use .value_counts
with normalize=True
pd.Series(a).value_counts(normalize=True).mul(100)
Result:
43 33.333333
12 25.000000
3 25.000000
22 8.333333
1 8.333333
dtype: float64