I have a pandas dataframe that looks like this
data1 data2
0 overall_phase1_b3 overall_phase1_b5
1 overall_phase2_b3 overall_phase5_b5
2 overall_phase3_b3 overall_phase3_b5
My question is how can I get the dataframe rows with matching phase number? If I have phase1
in data1
column, I should have phase1
in data2
column.
Desired Output as below
data1 data2
0 overall_phase1_b3 overall_phase1_b5
1 overall_phase3_b3 overall_phase3_b5
CodePudding user response:
You do not need regex
to achieve this. You can use something like this instead:
df[df.data1.str.split("_", expand=True)[1] == df.data2.str.split("_", expand=True)[1]]
------------------------------------------
data1 data2
0 overall_phase1_b3 overall_phase1_b5
2 overall_phase3_b3 overall_phase3_b5
------------------------------------------
What this does is basically to split the columns data1
and data2
by '_' and then it compares the second value (including 'phasex') of the extended data frames in both columns. The comparison gives you a mask that can be used to reduce your data.
CodePudding user response:
Since we are dealing with Pandas, I'll provide you the simple answer.
import pandas as pd
df = pd.DataFrame(columns=["data1","data2"])
data1 = ['overall_phase1_b3','overall_phase1_b3','overall_phase3_b3']
data2 = ['overall_phase1_b5','overall_phase5_b5','overall_phase3_b5']
df['data1'] = data1
df['data2'] = data2
df
The above code will generate you the Pandas Dataframe for the given data.
result = pd.DataFrame(columns=["data1","data2"])
result_d1 = []
result_d2 = []
for i,j in df.iterrows():
if j.data1.split('_')[1][-1] == j.data2.split('_')[1][-1]:
result_d1.append(j.data1)
result_d2.append(j.data2)
result['data1'] = result_d1
result['data2'] = result_d2
result
After looking your data, we can use the String Split method to compare the phase number with the respective rows that'll tell you the matching phases across each rows. If you don't want to store the result in a DataFrame, better to use print statement instead of pushing the results in a DataFrame.
Nice Question though, happy coding ..!