Home > front end >  How to restructure a complex dataframe
How to restructure a complex dataframe

Time:09-17

I have this Pandas DataFrame:

data4 = {'g_pairs':['an_jk', 'tf_ha', 'ab_rh', 'et_x2','yr_po'],
         'g_a':['en762','en72b','en925','en980','en009'],
         'g_b':['en361','en231','en666','en771','en909'],
         'epi|ap':[0.020,1,0.05,0.7,0.001],
         'ap|epi':[1,1,0.1,0.0001,1],
         'fib|mac':[0.001,0.002,0.0021,0.3,0.005],
         'mac|fib':[0.0002,0.0043,0.0067,0.0123,0.0110]}
 
df4 = pd.DataFrame(data4)
   g_pairs  g_a     g_b    epi|ap   ap|epi  fib|mac mac|fib
0   an_jk   en762   en361   0.020   1.0000  0.0010  0.0002
1   tf_ha   en72b   en231   1.000   1.0000  0.0020  0.0043
2   ab_rh   en925   en666   0.050   0.1000  0.0021  0.0067
3   et_x2   en980   en771   0.700   0.0001  0.3000  0.0123
4   yr_po   en009   en909   0.001   1.0000  0.0050  0.0110

I would like to restructure the table by repositioning column name opposites. For example, an_jk has a value of 1 under ap|epi, but if I flip the name to jk_an I can add that new item under the epi|ap column and get rid of the ap|epi column.

Output should look like this:

   g_pairs  g_a     g_b     epi|ap  fib|mac
0   an_jk   en762   en361   0.0200  0.0010
1   jk_an   en762   en361   1.0000  0.0002
2   tf_ha   en72b   en231   1.0000  0.0020
3   ha_tf   en72b   en231   1.0000  0.0043
4   ab_rh   en925   en666   0.0500  0.0021
5   rh_ab   en925   en666   0.1000  0.0067
6   et_x2   en980   en771   0.7000  0.3000
7   x2_et   en980   en771   0.0001  0.0123
8   yr_po   en009   en909   0.0010  0.0050
9   po_yr   en009   en909   1.0000  0.0110

Thank you very much in advance.

CodePudding user response:

Solution

# Copy the pertinent columns
new_df = df4.iloc[:, [0, 1, 2, 4, 6]].copy()

# Reverse the g_pairs
new_df["g_pairs"] = new_df["g_pairs"].apply(lambda s: "_".join(s.split("_")[::-1]))

# Rename the "ap|epi" and "mac|fib" columns
new_df.rename(columns={"ap|epi": "epi|ap", "mac|fib": "fib|mac"}, inplace=True)

# Interleave the two DataFrames
new_df = pd.concat([df4.iloc[:, [0, 1, 2, 3, 5]], new_df]).sort_index().reset_index(drop=True)

Steps

First, make a new DataFrame with the columns that need to be transformed or preserved:

>>> new_df = df4.iloc[:, [0, 1, 2, 4, 6]].copy()
>>> new_df
  g_pairs    g_a    g_b  ap|epi  mac|fib
0   an_jk  en762  en361  1.0000   0.0002
1   tf_ha  en72b  en231  1.0000   0.0043
2   ab_rh  en925  en666  0.1000   0.0067
3   et_x2  en980  en771  0.0001   0.0123
4   yr_po  en009  en909  1.0000   0.0110

Now reverse the g_pairs:

>>> new_df["g_pairs"] = new_df["g_pairs"].apply(lambda s: "_".join(s.split("_")[::-1]))
>>> new_df
  g_pairs    g_a    g_b  ap|epi  mac|fib
0   jk_an  en762  en361  1.0000   0.0002
1   ha_tf  en72b  en231  1.0000   0.0043
2   rh_ab  en925  en666  0.1000   0.0067
3   x2_et  en980  en771  0.0001   0.0123
4   po_yr  en009  en909  1.0000   0.0110

Now, simply rename the "ap|epi" and "mac|fib" columns and you'll be ready to merge:

>>> new_df.rename(columns={"ap|epi": "epi|ap", "mac|fib": "fib|mac"}, inplace=True)
>>> new_df
  g_pairs    g_a    g_b  epi|ap  fib|mac
0   jk_an  en762  en361  1.0000   0.0002
1   ha_tf  en72b  en231  1.0000   0.0043
2   rh_ab  en925  en666  0.1000   0.0067
3   x2_et  en980  en771  0.0001   0.0123
4   po_yr  en009  en909  1.0000   0.0110

Finally, use pd.concat() to interleave the two DataFrames:

>>> new_df = pd.concat([df4.iloc[:, [0, 1, 2, 3, 5]], new_df]).sort_index().reset_index(drop=True)
>>> new_df
  g_pairs    g_a    g_b  epi|ap  fib|mac
0   an_jk  en762  en361  0.0200   0.0010
1   jk_an  en762  en361  1.0000   0.0002
2   tf_ha  en72b  en231  1.0000   0.0020
3   ha_tf  en72b  en231  1.0000   0.0043
4   ab_rh  en925  en666  0.0500   0.0021
5   rh_ab  en925  en666  0.1000   0.0067
6   et_x2  en980  en771  0.7000   0.3000
7   x2_et  en980  en771  0.0001   0.0123
8   yr_po  en009  en909  0.0010   0.0050
9   po_yr  en009  en909  1.0000   0.0110

CodePudding user response:

You can create a new df and concat:

s = (pd.DataFrame({"g_pairs": df["g_pairs"].str.split("_").str[::-1].str.join("_"),
                   "g_a": df["g_b"],
                   "g_b": df["g_a"],
                   "epi|ap": df["ap|epi"],
                   "fib|mac": df["mac|fib"]}))

print (pd.concat([df[['g_pairs', 'g_a', 'g_b', 'epi|ap', 'fib|mac']], s]).sort_index())

  g_pairs    g_a    g_b  epi|ap  fib|mac
0   an_jk  en762  en361  0.0200   0.0010
0   jk_an  en361  en762  1.0000   0.0002
1   tf_ha  en72b  en231  1.0000   0.0020
1   ha_tf  en231  en72b  1.0000   0.0043
2   ab_rh  en925  en666  0.0500   0.0021
2   rh_ab  en666  en925  0.1000   0.0067
3   et_x2  en980  en771  0.7000   0.3000
3   x2_et  en771  en980  0.0001   0.0123
4   yr_po  en009  en909  0.0010   0.0050
4   po_yr  en909  en009  1.0000   0.0110
  • Related