Iterate over rows and write a new column if condition meets python-CodePudding

I have two separate df frames that I want to compare:

P53-Malat1
Neat1-Malat1
Gap1-Malat1

and f2:

intA,intB
P53-Malat1,Neat1-Malat1
Gap1-Malat1,Malat1-Pias3

I want to iterate over the rows of each column in f2 and see id it is inside the f1. If yes then print that row "found" and if no then print that row "not_found" in a seperate column.

The same for column two in f2.

I tried this approach but it doesn't work - am I missing something?

with open("f1.txt","r") as f1:
    content = f1.read().splitlines()
    #print(content)

f2 = pd.read_csv("f2.csv")


f2["col1_search"] = f2.apply(lambda x: x["intA"] "_found" if x in content else x["intA"] "_not_found", axis=1)
f2["col2_search"] = f2.apply(lambda x: x["intB"] "_found" if x in content else x["intB"] "_not_found", axis=1)

so the desired output should be f2 in this format:

col1_search,col2_search
P53-Malat1_found,Neat1-Malat1_found
Gap1-Malat1_found,Malat1-Pias3_not_found

Thank you.

CodePudding user response：

If I understand correctly content is a list not a dataframe. If this is the case you can us .isin which will return True or False for each row which can be mapped to whatever suffix you want.

import pandas as pd
content = ['P53-Malat1','Neat1-Malat1','Gap1-Malat1']


f2 = pd.DataFrame({'intA': {0: 'P53-Malat1', 1: 'Gap1-Malat1'},
                   'intB': {0: 'Neat1-Malat1', 1: 'Malat1-Pias3'}})

f2['col1_search'] = f2.intA   f2.intA.isin(content).map({True:'_found',False:'_not_found'})
f2['col2_search'] = f2.intB   f2.intB.isin(content).map({True:'_found',False:'_not_found'})

Output

          intA          intB        col1_search             col2_search
0   P53-Malat1  Neat1-Malat1   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1  Malat1-Pias3  Gap1-Malat1_found  Malat1-Pias3_not_found

Or perhaps if you have many columns:

(f2   f2.isin(content).replace({True:'_found',False:'_not_found'})).add_suffix('_search')

Output

         intA_search             intB_search
0   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1_found  Malat1-Pias3_not_found

which could be merged back to the original data with

pd.concat([f2,(f2   f2.isin(content).replace({True:'_found',False:'_not_found'})).add_suffix('_search')], axis=1)

Output

          intA          intB        intA_search             intB_search
0   P53-Malat1  Neat1-Malat1   P53-Malat1_found      Neat1-Malat1_found
1  Gap1-Malat1  Malat1-Pias3  Gap1-Malat1_found  Malat1-Pias3_not_found

CodePudding user response：

This is an example of how you would use the np.where

data = {'Category' : ['First', 'Second', 'Third'], 
        'First_Numbers' : [10, 10, 10],
        'Second_Numbers' : [20, 20, 20],
        'Third_Numbers' : [9, 21, 15]
       } 
df = pd.DataFrame(data)
comp_column = np.where((df['Third_Numbers'] < df['Second_Numbers']) & (df['Third_Numbers'] > df['First_Numbers']), 'found', 'not found')
df['check'] = comp_column
df

I inserted some sample data which you should be able to replace with your own data. Now I see your wanting to compare between 2 different dfs so I would recommend merging them so that you are only working on a single df. This is the best documentation for merging/joining/concating pandas df: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

CodePudding user response：

f2 = pd.read_csv("f2.csv")
def transform(path:str,x):
    with open(path,"r") as f1:
         content = f1.read().splitlines()
    if x in content:
        return f"{x}_found"
    return f"{x}_not_found"

f2["col1_search"] = f2['intA'].apply(lambda x:transform("f2.csv", x.intA),axis=1)
f2["col2_search"] = f2['intB'].apply(lambda x:transform("f2.csv", x.intB),axis=1)

enter code here