Home > Enterprise >  How to transpose and merge two dataframes with pandas - obtaining a key error
How to transpose and merge two dataframes with pandas - obtaining a key error

Time:03-28

I have a data frame (file1.txt) like this:

identifer   1     2    3
Fact1       494   43   3
Fact2       383   32   5
Fact3       384   23   5
Fact4       382   21   7

And another data frame (file2.txt) like this:

Sample     Char1   Char2  Char3
1          4        5       5
2          5        2       4
3          5        6       2
4          2        4       4 

the output should look like this:

Sample       Fact1       Fact2      Fact3     Char1    Char2    Char3
1            494         383        384       4         5       5   
2            43          32         5         5         2       4 
3            384         23         5         5         6       2

I wrote:

#to transpose Table1
df = pd.read_csv('file1.txt', sep='\t',header=0)
df2 = df.T


#To read in table 2
df3 = pd.read_csv('file2.txt', sep='\t',header=0)


df4 = df2.merge(df3,left_on = 'identifier',right_on='Sample',how='inner')
print(df4)

And I'm getting the error: 'KeyError: identifier'

When I print the columns of df2, i.e. the transposed data set, I can see that the columns are just the first row of data, and not the header, and the identifier row is the last row listed in the transposed matrix. Could someone explain to me how to transpose and merge these data frames? I was trying to follow a SO answer that said to .set_index() and then transpose, but when I do df2 = df.set_index('identifier').T I'm getting the same error. Following another SO suggestion I was trying here, I changed from merge to join so I did df2.join(df3.set_index['Sample'],on='identifier) but then I'm getting other errors (in this error 'method object is not subscriptable') so I'm just stuck and would appreciate insight.

CodePudding user response:

You can to set the index

df1 = df1.set_index('identifer')
df1.columns = df1.columns.astype(float)
out = df1.T.join(df2.set_index('Sample'))#.reset_index()
Out[82]: 
     Fact1  Fact2  Fact3  Fact4  Char1  Char2  Char3
1.0    494    383    384    382      4      5      5
2.0     43     32     23     21      5      2      4
3.0      3      5      5      7      5      6      2

CodePudding user response:

A slightly modified version of previous answer:

out = (
  df1.set_index('identifer').T.join(df2.astype({'Sample': str}).set_index('Sample'))
     .rename_axis('Sample').reset_index().astype({'Sample': int})
)
print(out)

# Output:
  Sample  Fact1  Fact2  Fact3  Fact4  Char1  Char2  Char3
0      1    494    383    384    382      4      5      5
1      2     43     32     23     21      5      2      4
2      3      3      5      5      7      5      6      2

Note: you need to cat Sample column as string because in the other dataframe the column index has dtype str. To perform the join, the indexes must be of the same type.

  • Related