I am trying to put "inhibitory" or "excitatory" in both columns "postsyn_type" and "presyn_type" of df1
, every time that a value in "post_pt_root_id" and "pre_pt_root_id" columns (respectively), matches the value in df2
in the column "pt_root_id".
Examples of dataframes I have:
df1 = pd.DataFrame({'pre_pt_root_id': [1,1,1,2,2], 'post_pt_root_id': [5,1,5,6,7]})
pre_pt_root_id post_pt_root_id
0 1 5
1 1 1
2 1 5
3 2 6
4 2 7
df2 = pd.DataFrame({'pt_root_id': [1,2,3,4,5,6,7], 'type': ['inhib','excit','inhib','inhib','excit','excit','inhib']})
pt_root_id type
0 1 inhib
1 2 excit
2 3 inhib
3 4 inhib
4 5 excit
5 6 excit
6 7 inhib
Example of result:
df1 = pd.DataFrame({'pre_pt_root_id': [1,1,1,2,2], 'post_pt_root_id': [5,1,5,6,7], 'presyn_type': ['inhib','inhib','inhib','excit','excit'], 'postsyn_type': ['excit','inhib','excit','excit','inhib']})
pre_pt_root_id post_pt_root_id presyn_type postsyn_type
0 1 5 inhib excit
1 1 1 inhib inhib
2 1 5 inhib excit
3 2 6 excit excit
4 2 7 excit inhib
I have already tried with merge but it doesn't seem to work well with what I want to do. As you might have noticed, in df1
a value in the column "pre_pt_root_id" can be repeated several times, thus the value i have to put in "presyn_type" must be the same for every repetition. Can anyone help?
CodePudding user response:
Create a mapping series d
, then use Series.map
to substitute the values in pre_pt_root_id
and post_pt_root_id
columns
d = df2.set_index('pt_root_id')['type']
df1['presyn_type'] = df1['pre_pt_root_id'].map(d)
df1['postsyn_type'] = df1['post_pt_root_id'].map(d)
pre_pt_root_id post_pt_root_id presyn_type postsyn_type
0 1 5 inhib excit
1 1 1 inhib inhib
2 1 5 inhib excit
3 2 6 excit excit
4 2 7 excit inhib
CodePudding user response:
df2['pre_pt_root_id'] = df2['pt_root_id']
df2['post_pt_root_id'] = df2['pt_root_id']
df3 = df1
df3 = pd.merge(df3, df2, on='pre_pt_root_id', ).drop(columns=['pt_root_id','post_pt_root_id_y']).rename(columns={'cell_type':'presyn_type','post_pt_root_id_x':'post_pt_root_id'})
df3 = pd.merge(df3, df2, on='post_pt_root_id', ).drop(columns=['pt_root_id','pre_pt_root_id_y']).rename(columns={'cell_type':'postsyn_type'})
CodePudding user response:
import pandas as pd
import random
###############################
# creating a reproducible MWE #
###############################
random.seed(22)
cell_type = ['inhibitory', 'excitatory']
pt_root_id = [random.getrandbits(32) for _ in range(1, 10)]
cell_types = [random.choice(cell_type) for _ in range(1, 10)]
# create a sample df2.
df2 = pd.DataFrame()
df2['pt_root_id'] = pt_root_id
df2['cell_types'] = cell_types
# assuming that pt_root_id column is unique. set it as index
df2.set_index('pt_root_id', inplace=True)
# create a sample df1
df1 = pd.DataFrame()
df1['pre_pt_root_id'] = [random.choice(pt_root_id) for _ in range(1, 10)]
df1['post_pt_root_id'] = [random.choice(pt_root_id) for _ in range(1, 10)]
#############################
# solution to your problem. #
#############################
df1['presyn_type'] = df2.loc[df1['pre_pt_root_id']]['cell_types'].tolist()
df1['postsyn_type'] = df2.loc[df1['post_pt_root_id']]['cell_types'].tolist()
print(df2)
print(df1)
This is perhaps not the best way to do this. But I think it does what you want.
Good luck!!