I am finding difficulties to do this
# Save the bash process in a python variable
cmd='bedtools intersect -wao -b/path/file_a -a /path/file_b'
p=Popen(cmd,shell=True,stdin=PIPE,stdout=PIPE,stderr=STDOUT,close_fds=True)
output=p.stdout.read()
# Import this string that is a tsv
df=StringIO(output)
cnvs=pd.read_csv(df,
sep='\t',
index_col=False,
names=['#CHROM',
'START',
'END',
'CNV_TYPE'
'CNV ID',
'Chromosome_g'.
'Transcript_start_g',
'Transcript_end_g',
'Transcript_stable_ID_g',
'canonical_g',
'Gene stable_ID_g',
'Gene_name_g',
'amount_overlap_g'])
I have been trying different approaches found from different tutorials. As it is now I get the error
TypeError:initial_value must be a str or None, not bites
More than fix the error I want to know if this is the way of doing this.
Originally I was saving the output of the bash command in a file and then loading the file into pandas. I am not only think that this is not the most pathonic way of doing this but also I am working in a HPC and it is very slow to create a file.
CodePudding user response:
I don't think that's especially unpythonic, personally. It's more or less what I would do if I had a command that printed a tsv to stdout. To fix the error, note that stdout is a bytes object. So, use BytesIO
instead of StringIO
. (Also, I would use with Popen(...) as p:
; that is a bit more Pythonic)