I have a data frame (file1.txt) like this:
identifer 1 2 3
Fact1 494 43 3
Fact2 383 32 5
Fact3 384 23 5
Fact4 382 21 7
And another data frame (file2.txt) like this:
Sample Char1 Char2 Char3
1 4 5 5
2 5 2 4
3 5 6 2
4 2 4 4
the output should look like this:
Sample Fact1 Fact2 Fact3 Char1 Char2 Char3
1 494 383 384 4 5 5
2 43 32 5 5 2 4
3 384 23 5 5 6 2
I wrote:
#to transpose Table1
df = pd.read_csv('file1.txt', sep='\t',header=0)
df2 = df.T
#To read in table 2
df3 = pd.read_csv('file2.txt', sep='\t',header=0)
df4 = df2.merge(df3,left_on = 'identifier',right_on='Sample',how='inner')
print(df4)
And I'm getting the error: 'KeyError: identifier'
When I print the columns of df2, i.e. the transposed data set, I can see that the columns are just the first row of data, and not the header, and the identifier row is the last row listed in the transposed matrix. Could someone explain to me how to transpose and merge these data frames? I was trying to follow a SO answer that said to .set_index()
and then transpose, but when I do df2 = df.set_index('identifier').T
I'm getting the same error. Following another SO suggestion I was trying here, I changed from merge to join so I did df2.join(df3.set_index['Sample'],on='identifier)
but then I'm getting other errors (in this error 'method object is not subscriptable') so I'm just stuck and would appreciate insight.
CodePudding user response:
You can to set the index
df1 = df1.set_index('identifer')
df1.columns = df1.columns.astype(float)
out = df1.T.join(df2.set_index('Sample'))#.reset_index()
Out[82]:
Fact1 Fact2 Fact3 Fact4 Char1 Char2 Char3
1.0 494 383 384 382 4 5 5
2.0 43 32 23 21 5 2 4
3.0 3 5 5 7 5 6 2
CodePudding user response:
A slightly modified version of previous answer:
out = (
df1.set_index('identifer').T.join(df2.astype({'Sample': str}).set_index('Sample'))
.rename_axis('Sample').reset_index().astype({'Sample': int})
)
print(out)
# Output:
Sample Fact1 Fact2 Fact3 Fact4 Char1 Char2 Char3
0 1 494 383 384 382 4 5 5
1 2 43 32 23 21 5 2 4
2 3 3 5 5 7 5 6 2
Note: you need to cat Sample
column as string because in the other dataframe the column index has dtype str
. To perform the join, the indexes must be of the same type.