I've the following dataframe -
df1
Location | Office | ROP |
---|---|---|
Barcelona | Head Office | 4.3% |
Bengaluru | Corporate Office | 9.6% |
Chicago | Head Office | 12.5% |
Luxembourg | Corporate Office | 14.1% |
Paris | Head Office | 12.7% |
Toronto | Head Office | 11.5% |
Berlin | Corporate Office | 14.3% |
Bengaluru | Head Office | 4.6% |
Luxembourg | Head Office | 7.1% |
Berlin | Head Office | 5.3% |
Luxembourg | Virtual Center | 10.1% |
Berlin | Virtual Center | 12.3% |
Paris | Virtual Center | 9.7% |
: | : | : |
: | : | : |
: | : | : |
a = ['Berlin','Paris','Luxembourg',...]
b = ['Head Office','Corporate Office',..]
Say there are multiple values in lists a and b, how do I find the sum and average of ROP based on the values in the lists and the given dataframe?
Example:
Say we have data from above mentioned dataframe in 'df2'.
df2 has just the visible 13 rows from dataframe 'df1'.
a = ['Berlin','Paris','Luxembourg']
b = ['Head Office','Corporate Office']
Expected output:
Sum: 14.3% 5.3% 12.7% 7.1% 14.1% = 53.5%
Average: (14.3% 5.3% 12.7% 7.1% 14.1%)/5 = 10.7%
CodePudding user response:
Try:
# convert ROP column to float:
df["ROP_int"] = df["ROP"].str.strip("%").astype(float)
a = ["Berlin", "Paris", "Luxembourg"]
b = ["Head Office", "Corporate Office"]
# create a mask
m = df["Location"].isin(a) & df["Office"].isin(b)
# compute sum and average from the mask and ROP_int column:
s = df.loc[m, "ROP_int"].sum()
avg = df.loc[m, "ROP_int"].mean()
print(s)
print(avg)
Prints:
53.49999999999999
10.7