Home > Enterprise >  column merge in Linux with multiple files
column merge in Linux with multiple files

Time:09-28

What is the simplest way to column merge multiple tables in Linux? I am googling and finding join, lots of SQL results. I was hoping a simple cat command would do the trick, but it is copying over the column names each time as well. What I am looking for is to combine table1 and table2 but only keep the column names once. See desired table.

table1 
Estimated Number of Cells,Mean Reads per Cell,Median Genes per Cell,Number of Reads,Valid Barcodes,Sequencing Saturation,Q30 Bases in Barcode,Q30 Bases in RNA Read,Q30 Bases in UMI,Reads Mapped to Genome,Reads Mapped Confidently to Genome,Reads Mapped Confidently to Intergenic Regions,Reads Mapped Confidently to Intronic Regions,Reads Mapped Confidently to Exonic Regions,Reads Mapped Confidently to Transcriptome,Reads Mapped Antisense to Gene,Fraction Reads in Cells,Total Genes Detected,Median UMI Counts per Cell
"18,137","26,624",984,"482,881,204",97.6%,48.7%,95.8%,92.7%,95.3%,96.6%,93.4%,4.2%,36.1%,53.1%,49.6%,2.2%,79.0%,"24,586","2,981"

table2
Estimated Number of Cells,Mean Reads per Cell,Median Genes per Cell,Number of Reads,Valid Barcodes,Sequencing Saturation,Q30 Bases in Barcode,Q30 Bases in RNA Read,Q30 Bases in UMI,Reads Mapped to Genome,Reads Mapped Confidently to Genome,Reads Mapped Confidently to Intergenic Regions,Reads Mapped Confidently to Intronic Regions,Reads Mapped Confidently to Exonic Regions,Reads Mapped Confidently to Transcriptome,Reads Mapped Antisense to Gene,Fraction Reads in Cells,Total Genes Detected,Median UMI Counts per Cell
"19,927","23,613",946,"470,526,357",97.4%,48.7%,95.7%,92.8%,95.3%,96.7%,91.0%,5.0%,32.9%,53.1%,49.8%,2.2%,91.9%,"24,601","3,250"

master
Estimated Number of Cells,Mean Reads per Cell,Median Genes per Cell,Number of Reads,Valid Barcodes,Sequencing Saturation,Q30 Bases in Barcode,Q30 Bases in RNA Read,Q30 Bases in UMI,Reads Mapped to Genome,Reads Mapped Confidently to Genome,Reads Mapped Confidently to Intergenic Regions,Reads Mapped Confidently to Intronic Regions,Reads Mapped Confidently to Exonic Regions,Reads Mapped Confidently to Transcriptome,Reads Mapped Antisense to Gene,Fraction Reads in Cells,Total Genes Detected,Median UMI Counts per Cell
"18,137","26,624",984,"482,881,204",97.6%,48.7%,95.8%,92.7%,95.3%,96.6%,93.4%,4.2%,36.1%,53.1%,49.6%,2.2%,79.0%,"24,586","2,981"
Estimated Number of Cells,Mean Reads per Cell,Median Genes per Cell,Number of Reads,Valid Barcodes,Sequencing Saturation,Q30 Bases in Barcode,Q30 Bases in RNA Read,Q30 Bases in UMI,Reads Mapped to Genome,Reads Mapped Confidently to Genome,Reads Mapped Confidently to Intergenic Regions,Reads Mapped Confidently to Intronic Regions,Reads Mapped Confidently to Exonic Regions,Reads Mapped Confidently to Transcriptome,Reads Mapped Antisense to Gene,Fraction Reads in Cells,Total Genes Detected,Median UMI Counts per Cell
"19,927","23,613",946,"470,526,357",97.4%,48.7%,95.7%,92.8%,95.3%,96.7%,91.0%,5.0%,32.9%,53.1%,49.8%,2.2%,91.9%,"24,601","3,250"

desired
Estimated Number of Cells,Mean Reads per Cell,Median Genes per Cell,Number of Reads,Valid Barcodes,Sequencing Saturation,Q30 Bases in Barcode,Q30 Bases in RNA Read,Q30 Bases in UMI,Reads Mapped to Genome,Reads Mapped Confidently to Genome,Reads Mapped Confidently to Intergenic Regions,Reads Mapped Confidently to Intronic Regions,Reads Mapped Confidently to Exonic Regions,Reads Mapped Confidently to Transcriptome,Reads Mapped Antisense to Gene,Fraction Reads in Cells,Total Genes Detected,Median UMI Counts per Cell
"18,137","26,624",984,"482,881,204",97.6%,48.7%,95.8%,92.7%,95.3%,96.6%,93.4%,4.2%,36.1%,53.1%,49.6%,2.2%,79.0%,"24,586","2,981"
"19,927","23,613",946,"470,526,357",97.4%,48.7%,95.7%,92.8%,95.3%,96.7%,91.0%,5.0%,32.9%,53.1%,49.8%,2.2%,91.9%,"24,601","3,250"

CodePudding user response:

Does this work for you?

awk 'NR==FNR{print} NR!=FNR && FNR>1{print}' table1 table2 > master
  • While working on the first file (NR==FNR) print every line (include the header).
  • On subsequent files, skip the header (FNR>1).
  • Redirect all output to master.
  • Related