I am have two excels, I am trying to take second column values from exel1 and placing the values in second column by joining them with Underscore in excel2
Excel1:
Word | Variable |
---|---|
identifier | id |
user | us |
phone | ph |
number | num |
phone number | pn |
Excel2:
Word | Variable Should be |
---|---|
user identifier | us_id |
user phone number | us_pn |
identifier number | id_num |
I am not getting whole row while looping.
import pandas as pd
import os
file1= 'C:/Users/madhu/Desktop/Excel1.xlsx'
file2= 'C:/Users/madhu/Desktop/Book1.xlsx'
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)
#df1.to_dict()
#df2.to_dict()
var=[]
print(df1)
print(df2)
for row,col in range(len(df1)):
for row1,col in range(len(df2)):
if row.isspace() == True:
var.append(df1[row])
return '_'.join(var)
elif row == row1:
var.append(df1[row])
return '_'.join(var)
else:
pass
Can anyone please help me? Thanks.
CodePudding user response:
I am assuming that for "user phone number" you need "us_pn" as a variable. I am also assuming that the code does not need to return any values.
import pandas as pd
import os
file1= 'C:/Users/madhu/Desktop/Excel1.xlsx'
file2= 'C:/Users/madhu/Desktop/Book1.xlsx'
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)
# Uncomment the following piece of code if the excel files have nan.
# df2.fillna('0', inplace=True)
print(df2)
for row2 in df2.values:
word_list = list(row2[0].split(' '))
# This is to handle the special case of 'user phone number'
# with output of 'us_pn'.
# If the desired output is otherwise
# 'us_ph_num', then this piece of code is not needed.
if 'phone number' in row2[0]:
word_list[word_list.index('phone')] = 'phone number'
word_list[word_list.index('number')] = ''
var_list = []
for word in word_list:
for row1 in df1.values:
if word == row1[0]:
var_list.append(row1[1])
row2[1] = "_".join(var_list)
If there is anything wrong with my assumptions, then do let me know and I will fix the code accordingly.
CodePudding user response:
IIUC! Create a dict out of df1 and map for split items(S1 and S2) from df2. Refer below code
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)
Map = dict(zip(df1.Word, df1.Variable))
pat='(' '|'.join(Map.keys()) ')'
df2['S1']= df2['Word'].str.extract(pat=pat,expand=False).fillna('')
df2['S2'] = df2.apply(lambda x: x.Word.replace(x['S1'],''), axis =1)
df2['S2'] = df2['S2'].apply(lambda x: x.strip())
cols = ['S1', 'S2']
for col in cols:
df2[col] = df2[col].replace(Map)
df2['Variable Should be'] = df2['S1'] '_' df2['S2']
df2.drop(columns = ['S1', 'S2'], inplace = True)