I have a pandas dataframe like so:
id1 id2 name
11 23 abc xyz 2
22 24 def pqr 5
33 22 qaz wsx 55
44 55 qwe rty 43
55 29 asd zxc 7
Now I want to create a new column which will contain name[0]
of ith row, where id1[i] == id2[j]
. So in the end I will have:
id1 id2 name identifier
11 23 abc xyz 2
22 24 def pqr 5 def
33 22 qaz wsx 55 def
44 55 qwe rty 43 asd
55 29 asd zxc 7 asd
So far, I tried this to get the required id:
df['identifier'] = ''
for index, row in df.iterrows():
j = df.index[df['id1'] == row['id2']]
Not sure how to proceed further.
CodePudding user response:
IIUC, you could do:
# extract first word and set id1 as name
s = df.set_index('id1')['name'].str.extract('^(\S )', expand=False)
# compute the identifiers
s2 = df['id2'].map(s).fillna(df['id1'].map(s))
# mask the unique identifiers
mask = s2.groupby(s2).transform('size').gt(1)
df2 = df.assign(identifier=s2[mask])
output:
id1 id2 name identifier
0 11 23 abc xyz 2 NaN
1 22 24 def pqr 5 def
2 33 22 qaz wsx 55 def
3 44 55 qwe rty 43 asd
4 55 29 asd zxc 7 asd
CodePudding user response:
I was able to solve it like so:
df['identifier'] = ''
for index, row in df.iterrows():
try:
j = df.index[df['id1'] == row['id2']].to_list()[0]
df.at[j, 'identifier'] = df.iloc[j]['FamilyName'].split(" ")[0]
except:
pass
CodePudding user response:
I create a new column by using loc()
and use this conditional statement df['id1'] == df['id2']
on "name"
column, and create a new called 'identifier
' and invoke pandas.Series.str.split
method to separate strings (by each whitespace):
df['identifier']=df.loc[(df['id1']==df['id2']),'name'].str.split()
Now, is probably that will appear some NaN
values on this new column, then I replace to ' '
, this allows to indexing first value on the string list
df['identifier'].fillna(' ', inplace=True)
df['identifier']=df['identifier'].apply(lambda x: x[0])