Home > Net >  Export multiple dataframe to different csv in python
Export multiple dataframe to different csv in python

Time:11-25

I have this code:

import pandas as pd
import os
ext = ('.tsv')
for files in os.listdir(os.getcwd()):
  if files.endswith(ext):
    x = pd.read_table(files, sep='\t', usecols=['#Chrom','Pos','RawScore','PHRED'])
    x.drop_duplicates(subset ="Pos",keep = False, inplace = True)
    data_frame=x.head()
    print(data_frame)

       #Chrom        Pos  RawScore  PHRED
77171       6  167709702  7.852318   39.0
19180       6   31124849  7.623789   38.0
15823       6   29407955  6.982213   37.0
19182       6   31125257  6.817868   36.0
19974       6   31544591  6.201438   35.0
       #Chrom        Pos  RawScore  PHRED
52445       9  139634495  6.950686   36.0
46470       9  125391241  5.477094   34.0
49866       9  134385435  4.841222   33.0
48642       9  131475583  4.357986   31.0
40099       9  113233652  4.284035   31.0
       #Chrom        Pos  RawScore  PHRED
7050       13   32972626  6.472542   36.0
32416      13  100518634  5.405765   33.0
10834      13   42465713  4.406294   32.0
9963       13   39422624  4.374808   31.0
22993      13   76395620  4.193058   29.4

As you can imagine, I got multiple dataframes with the same columns names but from different Chromosomes. How can I get this multiples dataframes in differents csv files?

CodePudding user response:

You can save your dataFrames to .csv using panda's pandas.DataFrame.to_csv (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html). More specifically, in your case you can do this:

    for files in os.listdir(os.getcwd()):
        if files.endswith(ext):
            x = pd.read_table(files, sep='\t', usecols= 
            ['#Chrom','Pos','RawScore','PHRED'])
            x.drop_duplicates(subset ="Pos",keep = False, inplace = True)
            x.to_csv(f'Chrom{x.iloc[0,0]}.csv')

In here, x.iloc[0,0] will take the first element of the first column which is the #Chrom. You can also do this manually. Note that this method would not work if you want to have two different DataFrames with the same #Chromosome. In that case, you have to manually input the name of the csv file.

  • Related