I am not able to loop through the whole row in excel using python-CodePudding

I am have two excels, I am trying to take second column values from exel1 and placing the values in second column by joining them with Underscore in excel2

Excel1:

Word	Variable
identifier	id
user	us
phone	ph
number	num
phone number	pn

Excel2:

Word	Variable Should be
user identifier	us_id
user phone number	us_pn
identifier number	id_num

I am not getting whole row while looping.

     import pandas as pd
     import os

     file1= 'C:/Users/madhu/Desktop/Excel1.xlsx'
     file2= 'C:/Users/madhu/Desktop/Book1.xlsx'

     df1 = pd.read_excel(file1)
     df2 = pd.read_excel(file2)

     #df1.to_dict()
     #df2.to_dict()
     var=[]
     print(df1)
     print(df2)
     for row,col in range(len(df1)):
     for row1,col in range(len(df2)):
     if row.isspace() == True:
         var.append(df1[row])
         return '_'.join(var)
     elif row == row1:
         var.append(df1[row])
         return '_'.join(var)
     else:
         pass

Can anyone please help me? Thanks.

CodePudding user response：

I am assuming that for "user phone number" you need "us_pn" as a variable. I am also assuming that the code does not need to return any values.

import pandas as pd
import os

file1= 'C:/Users/madhu/Desktop/Excel1.xlsx'
file2= 'C:/Users/madhu/Desktop/Book1.xlsx'

df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)

# Uncomment the following piece of code if the excel files have nan.
# df2.fillna('0', inplace=True)

print(df2)

for row2 in df2.values:
    word_list = list(row2[0].split(' '))
    
    # This is to handle the special case of 'user phone number'
    # with output of 'us_pn'. 
    # If the desired output is otherwise
    # 'us_ph_num', then this piece of code is not needed.
    if 'phone number' in row2[0]:
        word_list[word_list.index('phone')] = 'phone number'
        word_list[word_list.index('number')] = ''
    
    var_list = []
    for word in word_list:
        for row1 in df1.values:
            if word == row1[0]:
                var_list.append(row1[1])
    row2[1] = "_".join(var_list)

If there is anything wrong with my assumptions, then do let me know and I will fix the code accordingly.

CodePudding user response：

IIUC! Create a dict out of df1 and map for split items(S1 and S2) from df2. Refer below code

df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)

Map = dict(zip(df1.Word, df1.Variable))

pat='(' '|'.join(Map.keys()) ')'
df2['S1']= df2['Word'].str.extract(pat=pat,expand=False).fillna('')
df2['S2'] = df2.apply(lambda x: x.Word.replace(x['S1'],''), axis =1)

df2['S2'] = df2['S2'].apply(lambda x: x.strip())
cols = ['S1', 'S2']

for col in cols:
    df2[col] = df2[col].replace(Map)

df2['Variable Should be'] = df2['S1']  '_'  df2['S2']
df2.drop(columns = ['S1', 'S2'], inplace = True)