Home > Mobile >  Python Pandas: Create new column by matching one column value to a different row [i] and column if a
Python Pandas: Create new column by matching one column value to a different row [i] and column if a

Time:05-12

Sorry for the very wordy title. I have a df which looks like this:

df:
         username     user_id    subreddit_id   subr_fav_by
0       'John69'      1          1              '5illycat'           
1       'John69'      1          2              'adsgd'           
2       'Harry12'     2          3              '5illycat'           
3       'adsgd'       3          4              'John69'           
4       '5illycat'    4          5              'John69'          

I want the df to look like this:

df:
         username     user_id    subreddit_id   subr_fav_by   subr_fav_by_id 
0       'John69'      1          1              '5illycat'       4     
1       'John69'      1          2              'adsgd'          3 
2       'Harry12'     2          3              '5illycat'       4    
3       'adsgd'       3          4              'John69'         1  
4       '5illycat'    4          5              'John69'         1 

So I want to create a fifth column which is the 'user_id' from the row that contains the 'author' which matches the string in the 'subr_faved_by' column.

I originally tried:

for i in range(len(df['user_id'])):
    for j in range(len(df['subr_fav_by'])):
        df.loc[j,'subr_fav_by'] = df.loc[i,'user_id'].where(df.loc[j,'subr_fav_by'].str.contains(df.loc[i,'username']))

but got the error:

TypeError: unhashable type: 'Series'

which I tried looking up, but couldn't find what would help me in this specific situation.

So I then tried:

df['subr_fav_by_id'] = df['subr_fav_by'].map(df.drop_duplicates('username').set_index('username')['user_id'])

But I am not sure why this only returns NaN in the 'subr_fav_by_id' column.

CodePudding user response:

You could use merge and drop_duplicates() with iloc[:,-1] to just add on the last column.

df['subr_fav_by_id'] = df.merge(df[['username', 'user_id']].drop_duplicates(), 
                                how='left', 
                                left_on='subr_fav_by', 
                                right_on='username').iloc[:,-1]
df
Out[1]: 
     username  user_id  subreddit_id subr_fav_by  subr_fav_by_id
0    'John69'        1             1  '5illycat'               4
1    'John69'        1             2     'adsgd'               3
2   'Harry12'        2             3  '5illycat'               4
3     'adsgd'        3             4    'John69'               1
4  '5illycat'        4             5    'John69'               1

CodePudding user response:

You could try pd.DataFrame.merge:

df.merge(df[['username', 'user_id']].rename(
    columns={'username':'subr_fav_by', 'user_id':'subr_fav_by_id'}), how='left')
  • Related