I am attempting to create a data summary table that has three columns. One of which shows the attribute's name, one of which shows the most frequently occurring attribute-value, and the other shows the count of that most frequently occurring value.
I have the following data:
data = {"Sex":["M", "M", "F", "F", "F", "F", "M", "F", "F", "M"],
"Product": ["X", "Y", "Y", "Z","Y", "Y", "Y", "X", "Z", "Y"],
"Answer":["Yes", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "Yes", "Yes"]}
df = pd.DataFrame(data)
I then do the following:
stat = dict([(column, [])
for column in df
if df[column].dtype == object])
for column in df:
stat[column] = df[column].value_counts()[
df[column].value_counts() == df[column].value_counts().max()]
print(stat)
I get the following output
{'Sex': F 6
Name: Sex, dtype: int64, 'Product': Y 6
Name: Product, dtype: int64, 'Answer': Yes 7
Name: Answer, dtype: int64}
How can I create a table that looks something like this?
Attribute | Most Frequently Occurring Value | Count |
---|---|---|
Sex | F | 6 |
Product | Y | 6 |
Answer | Yes | 7 |
Could I do it using a library like tabulate
or plotly
?
CodePudding user response:
If you're looping through a DataFrame, you're probably doing it wrong.
# Use `mode` to get the most frequent of each column.
most_frequent = df.mode().squeeze().rename('Most Frequently Occurring Value')
# Get the counts of those with a mask.
counts = df[df.eq(most_frequent)].count().rename('Count')
# Combine the results with `concat`
out = pd.concat([most_frequent, counts], axis=1)
print(out)
Output:
Most Frequently Occurring Value Count
Sex F 6
Product Y 6
Answer Yes 7
Make it pretty:
out.index.name = 'Attribute'
print(out.to_markdown())
# Output:
| Attribute | Most Frequently Occurring Value | Count |
|:------------|:----------------------------------|--------:|
| Sex | F | 6 |
| Product | Y | 6 |
| Answer | Yes | 7 |
Which looks like this in markdown:
Attribute | Most Frequently Occurring Value | Count |
---|---|---|
Sex | F | 6 |
Product | Y | 6 |
Answer | Yes | 7 |
CodePudding user response:
For better understanding I am gonna chop your question into pieces.
First, to get The attribute and count, columns
Groupby level = 0, axis=1 and select the most frequent value either using value_counts indexed, or mode
df2 = df.groupby(level=0,axis=1).apply(lambda x : x.value_counts()[0])
Unfortunately, for you pandas doesn't like group by axis = 1. So I cant do this with a single line of code using aggreagate, so for the Most Frequently Occurring Value column. Well a simple
df.mode(axis=0) # Will return your wanted series
Nonetheless good question congrats