I have this code:
import pandas as pd
import os
ext = ('.tsv')
for files in os.listdir(os.getcwd()):
if files.endswith(ext):
x = pd.read_table(files, sep='\t', usecols=['#Chrom','Pos','RawScore','PHRED'])
x.drop_duplicates(subset ="Pos",keep = False, inplace = True)
data_frame=x.head()
print(data_frame)
#Chrom Pos RawScore PHRED
77171 6 167709702 7.852318 39.0
19180 6 31124849 7.623789 38.0
15823 6 29407955 6.982213 37.0
19182 6 31125257 6.817868 36.0
19974 6 31544591 6.201438 35.0
#Chrom Pos RawScore PHRED
52445 9 139634495 6.950686 36.0
46470 9 125391241 5.477094 34.0
49866 9 134385435 4.841222 33.0
48642 9 131475583 4.357986 31.0
40099 9 113233652 4.284035 31.0
#Chrom Pos RawScore PHRED
7050 13 32972626 6.472542 36.0
32416 13 100518634 5.405765 33.0
10834 13 42465713 4.406294 32.0
9963 13 39422624 4.374808 31.0
22993 13 76395620 4.193058 29.4
As you can imagine, I got multiple dataframes with the same columns names but from different Chromosomes. How can I get this multiples dataframes in differents csv files?
CodePudding user response:
You can save your dataFrames to .csv using panda's pandas.DataFrame.to_csv (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html). More specifically, in your case you can do this:
for files in os.listdir(os.getcwd()):
if files.endswith(ext):
x = pd.read_table(files, sep='\t', usecols=
['#Chrom','Pos','RawScore','PHRED'])
x.drop_duplicates(subset ="Pos",keep = False, inplace = True)
x.to_csv(f'Chrom{x.iloc[0,0]}.csv')
In here, x.iloc[0,0]
will take the first element of the first column which is the #Chrom. You can also do this manually. Note that this method would not work if you want to have two different DataFrames with the same #Chromosome. In that case, you have to manually input the name of the csv file.