Home > Blockchain >  How to make a script that merges all .txt files into one .csv file into multiple columns in Powershe
How to make a script that merges all .txt files into one .csv file into multiple columns in Powershe

Time:01-21

I don't know how to merge multiple .txt files with datas into one .csv file each of the .txt file seperated into columns.

This is my code so far,

$location = (Get-Location).Path
$files = Get-ChildItem $location -Filter "*.asd.txt"
$data = @()

foreach ($file in $files) {
    $fileData = Get-Content $file.FullName

    foreach ($line in $fileData) {
        $lineData = $line -split "\t"
        $data = $lineData[1]
        Add-Content -Path "$location\output.csv" -Value  $data
    } 

}

Each of the file looks like this

I want to keep the first column "WaveLength" and put the second columns next to each other from all the files in the folder. The header will start with the exac name "stovikmladyDoupno2 2020080500001.asd" or "stovikmladyDoupno2 2020080500002.asd" and so on ....

so it should look like this

I have tried to look for information over two days and still don't know. I have tried to put "," on the end of the file, I though excel will handle with that, but nothing helped.

Here I provide few files as test data https://mega.nz/folder/zNhTzR4Z#rpc-BQdRfm3wxl87r9XUkw

few lines of data

Wavelength  stovikmladyDoupno2 2020080500000.asd
350  6.38961399706465E-02 
351  6.14107911262903E-02 
352  6.04866108251357E-02 
353  5.83485359067184E-02 
354  0.054978792413247 
355  5.27014859356317E-02 
356  5.34849237528764E-02 
357  5.32841277775603E-02 
358  5.23466655229364E-02 
359  5.47595002186027E-02 
360  5.22061034631109E-02 
361  4.90149806042666E-02 
362  4.81633530421385E-02 
363  4.83974076557941E-02 
364  4.65219929658367E-02 
365  0.044800930294557 
366  4.47830287392802E-02 
367  4.46947539436297E-02 
368  0.043756926558447 
369  4.31725380363072E-02 
370  4.36867609723618E-02 
371  4.33227601805265E-02 
372  4.29978664449687E-02 
373  4.23860463187361E-02 
374  4.12183604375401E-02 
375  4.14306521081773E-02 
376  4.11760903772502E-02 
377  4.06421127128478E-02 
378  4.09771489689262E-02 
379  4.10083126746385E-02 
380  4.05161601354181E-02 
381  3.97904564387456E-02 

CodePudding user response:

I assumed a location since I'm not fond of declaring file paths without a literal path. Please adjust path as needed.

$Files = Get-ChildItem J:\Test\*.txt -Recurse 

$Filecount = 0

$ObjectCollectionArray = @()

#Fist parse and collect each row in an array.. While keeping the datetime information from filename. 

foreach($File in $Files){

$Filecount  
Write-Host $Filecount 

$DateTime = $File.fullname.split(" ").split(".")[1]

$Content = Get-Content $File.FullName

foreach($Row in $Content){

    $Split = $Row.Split("`t")

    if($Split[0] -ne 'Wavelength'){

        $Object = [PSCustomObject]@{
            'Datetime' = $DateTime
            'Number' = $Split[0]
            'Wavelength' = $Split[1]
        }

        $ObjectCollectionArray  = $Object
    }
}    
}


#Match by number and create a new object with relation to the number and different datetime. 

$GroupedCollection = @()
$Grouped =  $ObjectCollectionArray | Group-Object number

foreach($GroupedNumber in $Grouped){
    $NumberObject = [PSCustomObject]@{
            'Number' = $GroupedNumber.Name
    }


foreach($Occurance in $GroupedNumber.Group){
        $NumberObject | Add-Member -NotePropertyName $Occurance.Datetime -NotePropertyValue $Occurance.wavelength
}

$GroupedCollection  = $NumberObject

}

$GroupedCollection | Export-Csv -Path J:\Test\result.csv -NoClobber -NoTypeInformation

CodePudding user response:

What you're looking to do is quite a hard task, there are a few ways to do it. This method requires that all files are in memory to process them. You can definitely treat these files as TSVs, so Import-Csv -Delimiter "`t" is an option so you can deal with objects instead of plain text.

# using this temp dictionary to create objects for each line of each tsv
$tmp  = [ordered]@{}
# get all files and enumerate
$csvs = Get-ChildItem $location -Filter *.asd.txt | ForEach-Object {
    # get their content as objects
    $content  = $_ | Import-Csv -Delimiter "`t"
    # get their property Name that is not `Wavelength`
    $property = $content[0].PSObject.Properties.Where{ $_.Name -ne 'Wavelength' }.Name

    # output an object holding the total lines of this csv,
    # its content and the property name of interest
    [pscustomobject]@{
        Lines    = $content.Count
        Content  = $content
        Property = $property
    }
}

# use a scriptblock to allow streaming so `Export-Csv` starts exporting as
# output is going through the pipeline
& {
    # for loop used for each line of the Tsv having the highest number of lines
    for($i = 0; $i -lt [System.Linq.Enumerable]::Max([int[]] $csvs.Lines); $i  ) {
        # this boolean is used to preserve the "Wavelength" value of the first Tsv
        $isFirstCsv = $true

        foreach($csv in $csvs) {
            # if this is the first object
            if($isFirstCsv) {
                # add the value of "Wavelength"
                $tmp['Wavelength'] = $csv.Content[$i].Wavelength
                # and set the bool to false, since we are only using this once
                $isFirstCsv = $false
            }
            # then add the value of each property of each Tsv to the temp dictionary
            $tmp[$csv.Property] = $csv.Content[$i].($csv.Property)
        }

        # then output this object
        [pscustomobject] $tmp
        # clear the temp dictionary
        $tmp.Clear()
    }
} | Export-Csv path\to\result.csv -NoTypeInformation

CodePudding user response:

Here is a much more efficient approach that treats the files as plain text, this method is much faster and memory efficient however not as reliable. It uses StreamReader to read the file contents line-by-line and a StringBuilder to construct each line.

& {
    # get all files and enumerate
    $readers = Get-ChildItem $location -Filter *.asd.txt | ForEach-Object {
        # create a stream reader for each file
        [System.IO.StreamReader] $_.FullName
    }

    # this StringBuilder is used to construct each line
    $sb = [System.Text.StringBuilder]::new()
    # while any of the readers has more content
    while($readers.EndOfStream -contains $false) {
        # signals this is our first Tsv
        $isFirstReader = $true
        # enumerate each reader
        foreach($reader in $readers) {
            # if this is the first Tsv
            if($isFirstReader) {
                # append the line as-is, only trimming exces white space
                $sb = $sb.Append($reader.ReadLine().Trim())
                $isFirstReader = $false
                # go to next reader
                continue
            }
            # if this is not the first Tsv,
            # split on Tab and exclude the first token (Wavelength)
            $null, $line = $reader.ReadLine().Trim() -split '\t'
            # append a Tab   this line
            $sb = $sb.Append("`t$line")
        }
        # append a new line and output the constructed string
        $sb.AppendLine().ToString()
        # and clear it for next lines
        $sb = $sb.Clear()
    }

    # dispose all readers when done
    $readers | ForEach-Object Dispose
} | Set-Content path\to\result.tsv -NoNewline
  • Related