How to put a specific values in a column in a dataframe when two elements matches-CodePudding

I am trying to put "inhibitory" or "excitatory" in both columns "postsyn_type" and "presyn_type" of df1, every time that a value in "post_pt_root_id" and "pre_pt_root_id" columns (respectively), matches the value in df2 in the column "pt_root_id".

Examples of dataframes I have:

df1 = pd.DataFrame({'pre_pt_root_id': [1,1,1,2,2], 'post_pt_root_id': [5,1,5,6,7]})

   pre_pt_root_id  post_pt_root_id
0               1                5
1               1                1
2               1                5
3               2                6
4               2                7

df2 = pd.DataFrame({'pt_root_id': [1,2,3,4,5,6,7], 'type': ['inhib','excit','inhib','inhib','excit','excit','inhib']})

   pt_root_id   type
0           1  inhib
1           2  excit
2           3  inhib
3           4  inhib
4           5  excit
5           6  excit
6           7  inhib

Example of result:

df1 = pd.DataFrame({'pre_pt_root_id': [1,1,1,2,2], 'post_pt_root_id': [5,1,5,6,7], 'presyn_type': ['inhib','inhib','inhib','excit','excit'], 'postsyn_type': ['excit','inhib','excit','excit','inhib']})

   pre_pt_root_id  post_pt_root_id presyn_type postsyn_type
0               1                5       inhib        excit
1               1                1       inhib        inhib
2               1                5       inhib        excit
3               2                6       excit        excit
4               2                7       excit        inhib

I have already tried with merge but it doesn't seem to work well with what I want to do. As you might have noticed, in df1 a value in the column "pre_pt_root_id" can be repeated several times, thus the value i have to put in "presyn_type" must be the same for every repetition. Can anyone help?

CodePudding user response：

Create a mapping series d, then use Series.map to substitute the values in pre_pt_root_id and post_pt_root_id columns

d = df2.set_index('pt_root_id')['type']
df1['presyn_type'] = df1['pre_pt_root_id'].map(d)
df1['postsyn_type'] = df1['post_pt_root_id'].map(d)

   pre_pt_root_id  post_pt_root_id presyn_type postsyn_type
0               1                5       inhib        excit
1               1                1       inhib        inhib
2               1                5       inhib        excit
3               2                6       excit        excit
4               2                7       excit        inhib

CodePudding user response：

df2['pre_pt_root_id'] = df2['pt_root_id']
df2['post_pt_root_id'] = df2['pt_root_id']

df3 = df1
df3 = pd.merge(df3, df2, on='pre_pt_root_id', ).drop(columns=['pt_root_id','post_pt_root_id_y']).rename(columns={'cell_type':'presyn_type','post_pt_root_id_x':'post_pt_root_id'})
df3 = pd.merge(df3, df2, on='post_pt_root_id', ).drop(columns=['pt_root_id','pre_pt_root_id_y']).rename(columns={'cell_type':'postsyn_type'})

CodePudding user response：

import pandas as pd
import random

###############################
# creating a reproducible MWE #
###############################
random.seed(22)

cell_type = ['inhibitory', 'excitatory']

pt_root_id = [random.getrandbits(32) for _ in range(1, 10)]
cell_types = [random.choice(cell_type) for _ in range(1, 10)]

# create a sample df2.
df2 = pd.DataFrame()
df2['pt_root_id'] = pt_root_id
df2['cell_types'] = cell_types
# assuming that pt_root_id column is unique. set it as index
df2.set_index('pt_root_id', inplace=True)

# create a sample df1
df1 = pd.DataFrame()
df1['pre_pt_root_id'] = [random.choice(pt_root_id) for _ in range(1, 10)]
df1['post_pt_root_id'] = [random.choice(pt_root_id) for _ in range(1, 10)]

#############################
# solution to your problem. #
#############################
df1['presyn_type'] = df2.loc[df1['pre_pt_root_id']]['cell_types'].tolist()
df1['postsyn_type'] = df2.loc[df1['post_pt_root_id']]['cell_types'].tolist()

print(df2)
print(df1)

This is perhaps not the best way to do this. But I think it does what you want.

Good luck!!