Home > Mobile >  String matching from one data frame column to another data frame column
String matching from one data frame column to another data frame column

Time:02-14

i have two data frame , A and B . In A Data frame two column value and filed . and in B data frame have also value and filed column. i want to match the 'value' column of B to A of 'Filed' column, replace the Filed of A to value of B. A=

 Value                     Filed        
valid username              username
valid username           input_txtuserid
Password                 input_txtpassword
Password                 txtPassword
Login                     input_submit_log_in
LOG IN                    SIGNIN

B=

 Value                     Filed        
input_txtuserid               "JOHN"
input_txtpassword           "78945"
input_submit_log_in        "Sucessfully"
Password                 txtPassword
City                       "London"
PLACE                      "4-A avenue Street"

i want my dataframe C to look like this

C=

Value                     Filed        
valid username            "JOHN"
Password                   "78945"
Login                       "Sucessfully"

i written below code , but i getting KeyError: 'City',

_map = dict(zip(A.Filed.values, A.Value.values))

def get_correct_value(row, _map=_map):

 new_value = _map[row.Value]

 filed = row.Filed

return new_value, filed

C = B.apply(get_correct_value, axis=1, result_type='expand')

C.columns = ['Value','Filed']

i want to ignore the keyword which is not available in A dataframe. '

CodePudding user response:

I'm going to assume that in the DataFrames are strings as we typically don't use Dataframes to carry variables. With this I created a sample with your dataframe values.

data_a = {"Value": ["valid username", "valid username", "Password", "Password", "Login", "LOG IN"],
         "Filed": ["username", "input_txtuserid", "input_txtpassword", "txtPassword", "input_submit_log_in", "SIGNIN"]}

data_b = {"Value": ["input_txtuserid", "input_txtpassword", "input_submit_log_in", "Password", "City", "PLACE"],
          "Filed": ["JOHN", "78945", "Sucessfully", "txtPassword", "London", "4-A avenue Street"]}

A = pd.DataFrame(data_a)
B = pd.DataFrame(data_b)

A looks like: enter image description here

B looks like: enter image description here

Below the code to create C:

# Merging A and B, using a left join on the columns Filed for A and Value for B. Creatingg an indicator where exists
C = pd.merge(A, B, left_on=['Filed'], right_on=['Value'], how='left', indicator='Exist')

# If exists put true, otherwise false
C['Exist'] = np.where(C.Exist == 'both', True, False)
# Dropping all False so those that dont exist in both dataframes
C.drop(C[C['Exist'] == False].index, inplace=True)

# Making sure C has the right column and column names.
C = C[['Value_y', 'Filed_y']]
C.rename(columns = {"Value_y": "Value",
                    "Filed_y": "Filed"}, inplace = True)

Output of C enter image description here

Hope that helps! Please Mark this as answer if it does :)

  • Related