Home > Software engineering >  issue in dataframe series merge in python
issue in dataframe series merge in python

Time:08-30

i have two dataframes and i am trying to merge those but i am getting error. below is my code:

source input

def source_input():
    global df1
    try:
        src_input = pd.DataFrame().assign(table_name=myconn.df_input['src_table_name'],
                                      column_name=myconn.df_input['src_column_name'],
                                      business_key_name=myconn.df_input['src_business_key_name'],
                                      select_column_names=myconn.df_input['src_select_column_names'],
                                      where_condition=myconn.df_input['src_where_condition'])
        src_input["schema_name"] = lsrcdbschemaname
        src_input["schema_table"] = src_input["schema_name"]   "."   src_input["table_name"]
        loging(datetime.datetime.now(), '-1', src_input)
        sql = 'select '   src_input["column_name"]   ' as src_pk_clmn from '   src_input['schema_table']

        df1 = sql[sql.notna()].apply(lambda i: pd.read_sql_query(i, db_conn))
        return df1 

source output

0      src_pk_clmn
0      sonali
1      monika
2   ...
dtype: object

target input:

def target_input():
    global df2
    tgt_input = pd.DataFrame().assign(table_name=myconn.df_input['tgt_table_name'],
                                      column_name=myconn.df_input['tgt_column_name'],
                                      business_key_name=myconn.df_input['tgt_business_key_name'],
                                      select_column_names=myconn.df_input['tgt_select_column_names'],
                                      where_condition=myconn.df_input['tgt_where_condition'])
    tgt_input["schema_name"] = ltgtdbschemaname
    tgt_input["schema_table"] = tgt_input["schema_name"]   "."   tgt_input["table_name"]
    #print(tgt_input)
    try:
        pd.set_option('display.max_columns', None)
        sql = 'select '   tgt_input["column_name"]   ' from '   tgt_input['schema_table']
        df2 = sql[sql.notna()].apply(lambda i: pd.read_sql_query(i, db_conn))
        return df2

target output

1      FIRSTNAME
0    sonali
1    monika
2       SI...
def merge():
    df3 =pd.merge(df, df2, \
             left_on=["src_pk_clmn"], \
             right_on=["FIRSTNAME"], \
             how='outer')
    return(df3)

when i merge these two output i am getting error: ValueError: Cannot merge a Series without a name

My expectation is output should display like below. Please suggest me better way to achieve this.

0      src_pk_clmn   FIRSTNAME
0      sonali        sonali
1      monika        monika
2      sid           null
3      rahul         Rahul

CodePudding user response:

Please check whether df and df2 are really DataFrames or maybe rather series? Then your names are a bit misleading. In the later case, it might be simple - just give rach series a name before merging.

CodePudding user response:

I think you just want to use concat, not merge.

df3 = pd.concat([df1,df2],axis='columns')
  • Related