Concatenating DataFrames where DataFrame1 contains the missing values of DataFrame2 (Column Specific-CodePudding

I need to concatenate two DataFrames where both dataframes have a column named 'sample ids'. The first dataframe has all the relevant information needed, however the sample ids column in the first dataframe is missing all the sample ids that are within the second dataframe. Is there a way to insert the 'missing' sample ids (IN SEQUENTIAL ORDER) into the first dataframe using the second dataframe?

I have tried the following:

pd.concat([DF1,DF2],axis=1)

this did retain all information from both DataFrames, but the sample ids from both datframes were separated into different columns.

pd.merge(DF1,DF2,how='outer/inner/left/right')

this did not produce the desired outcome in the least...

I have shown the templates of the two dataframes below. Please help my brain is exploding!!!

DataFrame 2 DataFrame 1

CodePudding user response：

Try this :

df = df1.merge(df2, on="samp_id")

CodePudding user response：

If you want to:

insert the 'missing' sample ids (IN SEQUENTIAL ORDER) into the first dataframe using the second dataframe

you can use an outer join by .merge() with how='outer', as follows:

df_out = df1.merge(df2, on="samp_id",  how='outer')

To further ensure the samp_id are IN SEQUENTIAL ORDER, you can further sort on samp_id using .sort_values(), as follows:

df_out = df1.merge(df2, on="samp_id",  how='outer').sort_values('samp_id', ignore_index=True)