Home > Mobile >  How to merge two dataframes that have increasing number based column headings?
How to merge two dataframes that have increasing number based column headings?

Time:04-04

I have two data frames at the moment that look a bit like this but a lot larger. Dataframe 1:

Q1 Q2 Q3 Q4 Q5 Q6
a b c d e f

Dataframe 2:

Q1 Q2 Q3 Q4
a b d f

So,dataframe 2 is the same as dataframe 1 but it is missing Q3 and Q5 from dataframe 1. Q3 in dataframe 2 is the equivalent of Q4 in dataframe 1 and Q4 in dataframe 2 is the same as Q6 in dataframe 1. I want to merge these two tables together to look something like this:

Q1 Q2 Q3 Q4 Q5 Q6
a b c d e f
a b NULL d NULL f

My tables in reality are a lot larger and have a lot more columns and a couple more missing questions in the second table than shown in this example. So I was just wondering if anyone has a way of doing this so that I do not have to manually rename and fill all the columns. Thank you.

CodePudding user response:

I think the operation you are describing is pd.concat as @sammywemmy said.

import pandas as pd

df1 = pd.DataFrame({'Q1': ['a'], 'Q2': ['b'], 'Q3': ['c'], 'Q4': ['d'], 'Q5': ['e'], 'Q6': ['f']})
df2 = pd.DataFrame({'Q1': ['a'], 'Q2': ['b'], 'Q3': ['d'], 'Q4': ['f']})

print(pd.concat([df1, df2]))

Output:

  Q1 Q2 Q3 Q4   Q5   Q6
0  a  b  c  d    e    f
0  a  b  d  f  NaN  NaN

CodePudding user response:

Use:

nind1 = df2.columns.values.copy()
nind2 = np.unique(np.append(nind1, qindf1notdf2))
deff = len(nind2)-len(nind1)
for i, x in enumerate(nind2):
    if x in qindf1notdf2:
        nind2[nind2>=x] = [re.sub('\d', lambda s: str(int(s.group(0))   1), y) for y in nind2[nind2>=x]]

ncols = nind2[:-deff]
df2.columns = ncols
pd.concat([df1, df2])

Output:

enter image description here

CodePudding user response:

You can write a helper function to renaming the second dataframe automatically. All you need to know is the list of missing questions. I used here df1 as a source of names but you could also use a generator of successive 'Q1', 'Q2'... 'Qn'.

def rename(df, skip=[], ref=df1):
    skip = set(skip)
    cols = (c for c in ref.columns if c not in skip)
    return df.rename(columns=dict(zip(df.columns, cols)))

pd.concat([df1, rename(df2, skip=['Q3', 'Q5'])], ignore_index=True)

Output:

  Q1 Q2   Q3 Q4   Q5 Q6
0  a  b    c  d    e  f
1  a  b  NaN  d  NaN  f
helper function with Qx generator

This one relies purely on position, so you can use it if the names are imperfect even in df1

def rename(df, skip=[]):
    skip = set(skip)
    cols = (f'Q{i}' for i in range(1, len(df.columns) len(skip) 1)
            if i not in skip)
    return df.set_axis(cols, axis=1)

pd.concat([rename(df1), rename(df2, skip=[3,5])], ignore_index=True)
  • Related