Home > front end >  get pandas rows with same specific word in two columns
get pandas rows with same specific word in two columns

Time:07-11

I have a pandas dataframe that looks like this

     data1                 data2
0   overall_phase1_b3     overall_phase1_b5
1   overall_phase2_b3     overall_phase5_b5
2   overall_phase3_b3     overall_phase3_b5

My question is how can I get the dataframe rows with matching phase number? If I have phase1 in data1 column, I should have phase1 in data2 column.

Desired Output as below

       data1                 data2
0   overall_phase1_b3     overall_phase1_b5
1   overall_phase3_b3     overall_phase3_b5

CodePudding user response:

You do not need regex to achieve this. You can use something like this instead:

df[df.data1.str.split("_", expand=True)[1] == df.data2.str.split("_", expand=True)[1]]


------------------------------------------
    data1               data2
0   overall_phase1_b3   overall_phase1_b5
2   overall_phase3_b3   overall_phase3_b5
------------------------------------------

What this does is basically to split the columns data1 and data2 by '_' and then it compares the second value (including 'phasex') of the extended data frames in both columns. The comparison gives you a mask that can be used to reduce your data.

CodePudding user response:

Since we are dealing with Pandas, I'll provide you the simple answer.

import pandas as pd
df = pd.DataFrame(columns=["data1","data2"])
data1 = ['overall_phase1_b3','overall_phase1_b3','overall_phase3_b3']
data2 = ['overall_phase1_b5','overall_phase5_b5','overall_phase3_b5']
df['data1'] = data1
df['data2'] = data2
df

The above code will generate you the Pandas Dataframe for the given data.

result = pd.DataFrame(columns=["data1","data2"])
result_d1 = []
result_d2 = []
for i,j in df.iterrows():
    if j.data1.split('_')[1][-1] == j.data2.split('_')[1][-1]:
        result_d1.append(j.data1)
        result_d2.append(j.data2)
result['data1'] = result_d1
result['data2'] = result_d2
result

After looking your data, we can use the String Split method to compare the phase number with the respective rows that'll tell you the matching phases across each rows. If you don't want to store the result in a DataFrame, better to use print statement instead of pushing the results in a DataFrame.

Nice Question though, happy coding ..!

  • Related