I have the following Dataframe:
ID | event_1_positive | event_2_negative | event_3_neutral | event_4_negative | event_5_positive |
---|---|---|---|---|---|
1 | 1 | 1 | 0 | 1 | 0 |
2 | 1 | 0 | 0 | 0 | 1 |
3 | 0 | 1 | 1 | 1 | 0 |
4 | 1 | 1 | 0 | 0 | 0 |
5 | 0 | 0 | 1 | 1 | 1 |
6 | 0 | 1 | 1 | 0 | 1 |
7 | 1 | 1 | 1 | 0 | 0 |
8 | 0 | 1 | 0 | 0 | 0 |
9 | 1 | 1 | 1 | 1 | 0 |
10 | 0 | 0 | 0 | 0 | 1 |
What I'd like to do is to order the events by its effect (positive, negative or neutral) and return the count for each effect.
In this case, my expected output would be:
0 negative 11
1 positive 9
2 neutral 5
CodePudding user response:
Use filter
and groupby
:
>>> df.filter(like='event') \
.groupby(lambda x: x.rsplit('_', 1)[1], axis=1) \
.sum().sum()
negative 11
neutral 5
positive 9
dtype: int64
CodePudding user response:
You could create new columns where you add all the ones with similar effects, e.g.
df["negative"] = df["event_2_negative"] df["event_4_negative"]
and then delete the other columns and then do a df["negative"].value_counts()
for example or df["negative"].sum()
CodePudding user response:
This is a rare case where accounting in a loop seems practical as there's a manageable count of columns
# start a dictionary to pack with results
categories = {k: 0 for k in ("negative", "positive", "neutral")}
for colname in df: # dataframes are iterable by-column name!
if "_" not in colname:
continue # or consider alerting on this condition
for category in categories.keys():
if category in colname: # substring compare
categories[category] = df[colname].sum()
break # successfully found category -> next colname
else: # didn't find and break
raise ValueError(f"no category for {colname}")
# create a collection of results
df_results = pd.Series(categories).to_frame().reset_index()
index 0
0 negative 11
1 positive 9
2 neutral 5
CodePudding user response:
One option is to reshape the data, where you have all the negatives in one column, same for positives and neutral. Your columns have a pattern (some end in positives, some columns end in neutral, ...).
We can use that pattern in pivot_longer from pyjanitor to reshape the data, then aggregate(reduce):
# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor
(df.pivot_longer(index='ID',
names_to=('positive', 'negative', 'neutral'),
names_pattern=('positive', 'negative', 'neutral'))
.iloc[:, 1:]
.sum()
)
positive 9.0
negative 11.0
neutral 5.0
dtype: float64
CodePudding user response:
df. \
drop("ID", axis=1). \
rename(columns=lambda x: x.split("_")[2]). \
unstack(). \
droplevel(1). \
reset_index(). \
groupby("index"). \
sum()
# 0
# index
# negative 11
# neutral 5
# positive 9