Home > database >  How to join a dataframe with packed column and row names to an unpacked dataframe
How to join a dataframe with packed column and row names to an unpacked dataframe

Time:11-17

Suppose the next packed df, where column_name represents the name of the columns and row_name the name of the rows in an unpacked dataframe:

  column_name row_name  value
0        col1        a      1
1        col1        b      2
2        col1        c      3
3        col2        d      4
4        col2        e      5
5        col2        f      6
6        col2        g      7
7        col2        h      8

And the unpacked dataframe looks like this:

   individual col1 col2
0           1    a    e
1           2    b    f
2           3    c    g
3           4    a    h
4           5    b    e
5           6    c    f
6           7    a    g
7           8    b    h

Desired output could look like this:

individual  col1  col2   value_col1   value_col2
1             a     e       1              5
2             b     f       2              6
3             c     g       3              7
4             a     h       1              8
5             b     e       2              5
6             c     f       3              6
7             a     g       1              7
8             b     h       2              8

Is there any way to make a join in order to unpack value column into two distinct columns in the unpacked dataframe?

Reproducible data:

unpacked_df = pd.DataFrame({
    'individual': [1,2,3,4,5,6,7,8],
    'col1':['a','b','c','a','b','c','a','b'],
    'col2':['e','f','g','h','e','f','g','h']
})

packed_df = pd.DataFrame({
    'column_name': ['col1','col1','col1','col2','col2','col2','col2','col2'],
     'row_name': ['a','b','c','d','e','f','g','h'],
    'value':[1,2,3,4,5,6,7,8]
})

CodePudding user response:

You can do a double merge, first for col1, then for col2:

unpacked_df.merge(packed_df[packed_df.column_name.eq("col1")], left_on="col1", right_on="row_name").merge(packed_df[packed_df.column_name.eq("col2")], left_on="col2", right_on="row_name", suffixes=("_col1","_col2")).sort_values("individual")[["individual", "col1", "col2", "value_col1", "value_col2"]]

Result:

   individual col1 col2  value_col1  value_col2
0           1    a    e           1           5
6           2    b    f           2           6
5           3    c    g           3           7
2           4    a    h           1           8
1           5    b    e           2           5
7           6    c    f           3           6
4           7    a    g           1           7
3           8    b    h           2           8

CodePudding user response:

This works:

cols = packed_df['column_name'].unique()
dct = {k: [] for k in cols}
for row in unpacked_df.iloc:
    for col in cols:
        dct[col].append(packed_df[(packed_df['column_name'] == col) & (packed_df['row_name'] == row[col])]['value'].iloc[0])
  • Related