Sorry for the very wordy title. I have a df which looks like this:
df:
username user_id subreddit_id subr_fav_by
0 'John69' 1 1 '5illycat'
1 'John69' 1 2 'adsgd'
2 'Harry12' 2 3 '5illycat'
3 'adsgd' 3 4 'John69'
4 '5illycat' 4 5 'John69'
I want the df to look like this:
df:
username user_id subreddit_id subr_fav_by subr_fav_by_id
0 'John69' 1 1 '5illycat' 4
1 'John69' 1 2 'adsgd' 3
2 'Harry12' 2 3 '5illycat' 4
3 'adsgd' 3 4 'John69' 1
4 '5illycat' 4 5 'John69' 1
So I want to create a fifth column which is the 'user_id' from the row that contains the 'author' which matches the string in the 'subr_faved_by' column.
I originally tried:
for i in range(len(df['user_id'])):
for j in range(len(df['subr_fav_by'])):
df.loc[j,'subr_fav_by'] = df.loc[i,'user_id'].where(df.loc[j,'subr_fav_by'].str.contains(df.loc[i,'username']))
but got the error:
TypeError: unhashable type: 'Series'
which I tried looking up, but couldn't find what would help me in this specific situation.
So I then tried:
df['subr_fav_by_id'] = df['subr_fav_by'].map(df.drop_duplicates('username').set_index('username')['user_id'])
But I am not sure why this only returns NaN in the 'subr_fav_by_id' column.
CodePudding user response:
You could use merge
and drop_duplicates()
with iloc[:,-1]
to just add on the last column.
df['subr_fav_by_id'] = df.merge(df[['username', 'user_id']].drop_duplicates(),
how='left',
left_on='subr_fav_by',
right_on='username').iloc[:,-1]
df
Out[1]:
username user_id subreddit_id subr_fav_by subr_fav_by_id
0 'John69' 1 1 '5illycat' 4
1 'John69' 1 2 'adsgd' 3
2 'Harry12' 2 3 '5illycat' 4
3 'adsgd' 3 4 'John69' 1
4 '5illycat' 4 5 'John69' 1
CodePudding user response:
You could try pd.DataFrame.merge
:
df.merge(df[['username', 'user_id']].rename(
columns={'username':'subr_fav_by', 'user_id':'subr_fav_by_id'}), how='left')