I have a Pandas data frame (df) with 3 columns: "Drug" , "Affect" , "Journal" Every row is a scientific paper and each paper states what effect the drug has on the disease. Example:
Drug Affect Journal aspirin downregulate paper1 aspririn downregulate paper2 aspirin upregulate paper3 aspirin neutral paper4 aspirin downregulate paper5 aspririn upregulate paper6 iboruprofen upregulate paper7 iboruprofen upregulate paper8 iboruprofen downregulate paper9 other_drug ... ...
The output should look like this:
Drug
aspirin
I want to make a filter of the df to get only the drugs that in the majority of the papers show to downregulate the disease, so if aspirin for example has 3 papers that tell that it downregulates and 2 papers that say that it upregulates and 1 that says that it has a neutral effect then we assume that the majority of the evidence show that the effect is to downregulate. So I want to have aspirin in my filtered df. Therefore, iboruprofen should not make it to my filtered df.
How do I make the group by? Thanks for your help!
CodePudding user response:
Group by drug and the affect to count the number of records.
df_grouped = df.groupby(["Drug","Affect"], as_index=False).count()
The output:
Drug Affect Journal
0 aspirin downregulate 3
1 aspirin neutral 1
2 aspirin upregulate 2
3 iboruprofen downregulate 1
4 iboruprofen upregulate 2
Loop over the unique drug list and check the numbers. If downregulate > upregulate, add this drug to result list.
result_drugs = []
for drug in np.unique(df["Drug"]:
numberOfdownregulate = df_grouped[(df_grouped["Drug"] == drug) & (df_grouped["Affect"] == "downregulate")].Journal.values[0]
numberOfupregulate = df_grouped[(df_grouped["Drug"] == drug) & (df_grouped["Affect"] == "upregulate")].Journal.values[0]
result = numberOfdownregulate - numberOfupregulate
if result > 0:
result_drugs.append(drug)
Finally, filter your original dataframe.
df[df["Drug"].isin(result_drugs)]
If you only want to have dataframe with the desired drug values.
pd.DataFrame(result_drugs, columns=["Drug"])
Final output:
Drug
0 aspirin