I have this Pandas DataFrame:
data4 = {'g_pairs':['an_jk', 'tf_ha', 'ab_rh', 'et_x2','yr_po'],
'g_a':['en762','en72b','en925','en980','en009'],
'g_b':['en361','en231','en666','en771','en909'],
'epi|ap':[0.020,1,0.05,0.7,0.001],
'ap|epi':[1,1,0.1,0.0001,1],
'fib|mac':[0.001,0.002,0.0021,0.3,0.005],
'mac|fib':[0.0002,0.0043,0.0067,0.0123,0.0110]}
df4 = pd.DataFrame(data4)
g_pairs g_a g_b epi|ap ap|epi fib|mac mac|fib
0 an_jk en762 en361 0.020 1.0000 0.0010 0.0002
1 tf_ha en72b en231 1.000 1.0000 0.0020 0.0043
2 ab_rh en925 en666 0.050 0.1000 0.0021 0.0067
3 et_x2 en980 en771 0.700 0.0001 0.3000 0.0123
4 yr_po en009 en909 0.001 1.0000 0.0050 0.0110
I would like to restructure the table by repositioning column name opposites. For example, an_jk
has a value of 1
under ap|epi
, but if I flip the name to jk_an
I can add that new item under the epi|ap
column and get rid of the ap|epi
column.
Output should look like this:
g_pairs g_a g_b epi|ap fib|mac
0 an_jk en762 en361 0.0200 0.0010
1 jk_an en762 en361 1.0000 0.0002
2 tf_ha en72b en231 1.0000 0.0020
3 ha_tf en72b en231 1.0000 0.0043
4 ab_rh en925 en666 0.0500 0.0021
5 rh_ab en925 en666 0.1000 0.0067
6 et_x2 en980 en771 0.7000 0.3000
7 x2_et en980 en771 0.0001 0.0123
8 yr_po en009 en909 0.0010 0.0050
9 po_yr en009 en909 1.0000 0.0110
Thank you very much in advance.
CodePudding user response:
Solution
# Copy the pertinent columns
new_df = df4.iloc[:, [0, 1, 2, 4, 6]].copy()
# Reverse the g_pairs
new_df["g_pairs"] = new_df["g_pairs"].apply(lambda s: "_".join(s.split("_")[::-1]))
# Rename the "ap|epi" and "mac|fib" columns
new_df.rename(columns={"ap|epi": "epi|ap", "mac|fib": "fib|mac"}, inplace=True)
# Interleave the two DataFrames
new_df = pd.concat([df4.iloc[:, [0, 1, 2, 3, 5]], new_df]).sort_index().reset_index(drop=True)
Steps
First, make a new DataFrame with the columns that need to be transformed or preserved:
>>> new_df = df4.iloc[:, [0, 1, 2, 4, 6]].copy()
>>> new_df
g_pairs g_a g_b ap|epi mac|fib
0 an_jk en762 en361 1.0000 0.0002
1 tf_ha en72b en231 1.0000 0.0043
2 ab_rh en925 en666 0.1000 0.0067
3 et_x2 en980 en771 0.0001 0.0123
4 yr_po en009 en909 1.0000 0.0110
Now reverse the g_pairs
:
>>> new_df["g_pairs"] = new_df["g_pairs"].apply(lambda s: "_".join(s.split("_")[::-1]))
>>> new_df
g_pairs g_a g_b ap|epi mac|fib
0 jk_an en762 en361 1.0000 0.0002
1 ha_tf en72b en231 1.0000 0.0043
2 rh_ab en925 en666 0.1000 0.0067
3 x2_et en980 en771 0.0001 0.0123
4 po_yr en009 en909 1.0000 0.0110
Now, simply rename the "ap|epi"
and "mac|fib"
columns and you'll be ready to merge:
>>> new_df.rename(columns={"ap|epi": "epi|ap", "mac|fib": "fib|mac"}, inplace=True)
>>> new_df
g_pairs g_a g_b epi|ap fib|mac
0 jk_an en762 en361 1.0000 0.0002
1 ha_tf en72b en231 1.0000 0.0043
2 rh_ab en925 en666 0.1000 0.0067
3 x2_et en980 en771 0.0001 0.0123
4 po_yr en009 en909 1.0000 0.0110
Finally, use pd.concat()
to interleave the two DataFrames:
>>> new_df = pd.concat([df4.iloc[:, [0, 1, 2, 3, 5]], new_df]).sort_index().reset_index(drop=True)
>>> new_df
g_pairs g_a g_b epi|ap fib|mac
0 an_jk en762 en361 0.0200 0.0010
1 jk_an en762 en361 1.0000 0.0002
2 tf_ha en72b en231 1.0000 0.0020
3 ha_tf en72b en231 1.0000 0.0043
4 ab_rh en925 en666 0.0500 0.0021
5 rh_ab en925 en666 0.1000 0.0067
6 et_x2 en980 en771 0.7000 0.3000
7 x2_et en980 en771 0.0001 0.0123
8 yr_po en009 en909 0.0010 0.0050
9 po_yr en009 en909 1.0000 0.0110
CodePudding user response:
You can create a new df and concat
:
s = (pd.DataFrame({"g_pairs": df["g_pairs"].str.split("_").str[::-1].str.join("_"),
"g_a": df["g_b"],
"g_b": df["g_a"],
"epi|ap": df["ap|epi"],
"fib|mac": df["mac|fib"]}))
print (pd.concat([df[['g_pairs', 'g_a', 'g_b', 'epi|ap', 'fib|mac']], s]).sort_index())
g_pairs g_a g_b epi|ap fib|mac
0 an_jk en762 en361 0.0200 0.0010
0 jk_an en361 en762 1.0000 0.0002
1 tf_ha en72b en231 1.0000 0.0020
1 ha_tf en231 en72b 1.0000 0.0043
2 ab_rh en925 en666 0.0500 0.0021
2 rh_ab en666 en925 0.1000 0.0067
3 et_x2 en980 en771 0.7000 0.3000
3 x2_et en771 en980 0.0001 0.0123
4 yr_po en009 en909 0.0010 0.0050
4 po_yr en909 en009 1.0000 0.0110