Home > Enterprise >  Match Substring in a column Based on Some Multiple Filter Values in Another column and Aggregate the
Match Substring in a column Based on Some Multiple Filter Values in Another column and Aggregate the

Time:06-19

I am struggling to find the sum of count column based on subtring present in a column Name. The substring should co exist with the other multiple values present in another column that is Error Name. If substring (e.g. Ehsan) matches and another column i.e. Error Name has those multiple values (Device and Line Error) then i would some the count in Count Column. Remember I have to sum only those count which has substring Ehsan in Name and Device and Line Error in Error Name Below is my Raw Data:

enter image description here

And my output should look like this:

enter image description here

Please needful help is required, thanks

CodePudding user response:

After Editing The Question: You need to select rows that contain Ehsan and then use pandas.groupby on the result dataframe like below:

mask_name = df['Name'].str.contains(r'.*(?:Ehsan).*')
mask_err = df['Error Name'].str.contains(r'(?:\bLine Error\b|\bDevice\b)')
df = df[mask_name & mask_err]
df.groupby(['Name', 'Error Name'])['Count'].sum().reset_index()

Before Editing The Question: You can write a mask for columns Name and Error Name with regex then select rows that have True in two masks and sum Count for those rows with pandas.loc and pandas.sum like below:

mask_name = df['Name'].str.contains(r'.*(?:Ehsan).*')
mask_err = df['Error Name'].str.contains(r'(?:\bLine Error\b|\bDevice\b)')
df.loc[mask_name & mask_err, 'Count'].sum()

CodePudding user response:

We can use str.contains and sum here:

total = df[df["Name"].str.contains(r'\bEhsan\b', flags=re.I, na=False, regex=True)]["Count"].sum()
  • Related