I've got this one that has been baffling me all day, and I can't seem to find any search results that match exactly what I am trying to do.
I have 2 CSV files, both of which have the same columns and headers. They look like this (shortened for the purpose of this post):
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","none48","H006"
"1013740016604537004556","3835265","A007"
"1013740016604537004556","3835269","B007"
"1013740016604537004556","3835271","C007"
Each of the 2 CSVs only have some actual Lab IDs, and the 'nonexx' are just fillers for the importing software. There is no duplication ie each 'well' is only referenced once across the 2 files.
What I need to do is merge the 2 CSVs, for example the second CSV might have a Lab ID for well H006 but the first will not. I need the lab ID from the second CSV imported into the first, overwriting the 'nonexx' currently in that column.
Here is my current code:
$CSVB = Import-CSV "$RootDir\SymphonyOutputPending\$plateID`A_Header.csv"
Import-CSV "$RootDir\SymphonyOutputPending\$plateID`_Header.csv" | ForEach-Object {
$CSVData = [PSCustomObject]@{
labid = $_.labid
well = $_.well
}
If ($CSVB.well -match $CSVData.wellID) {
write-host "I MATCH"
($CSVB | Where-Object {$_.well -eq $CSVData.well}).labid = $CSVData.labid
}
$CSVB | Export-CSV "$RootDir\SymphonyOutputPending\$plateID`_final.csv" -NoTypeInformation
}
The code runs but doesn't 'merge' the data, the final CSV output is just a replication of the first input file. I am definitely getting a match as the string "I MATCH" appears several times when debugging as expected.
CodePudding user response:
Based on the responses in the comments of your question, I believe this is what you are looking for. This assumes that the both CSVs contain the exact same data with labid being the only difference.
There is no need to modify csv2 if we are just grabbing the labid to overwrite the row in csv1.
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
# Loop through csv1 rows
Foreach($line in $csv1) {
# If Labid contains "none"
If($line.labid -like "none*") {
# Set rows labid to the labid from csv2 row that matches plate/well
# May be able to remove the plate section if well is a unique value
$line.labid = ($csv2 | Where {$_.well -eq $line.well -and $_.plate -eq $line.plate}).labid
}
}
# Export to CSV - not overwrite - to confirm results
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation
CodePudding user response:
Since you need to do a bi-directional comparison of the 2 Csvs you could create a new array of both and then group the objects by their well
property, for this you can use Group-Object
, then filter each group if their Count
is equal to 2 where their labid
property does not start with none
else return the object as-is.
Using the following Csvs for demonstration purposes:
- Csv1
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","3835265","A007"
"newrowuniquecsv1","none123","X001"
- Csv2
"plate","labid","well"
"1013740016604537004556","none48","A007"
"1013740016604537004556","3835269","F006"
"1013740016604537004556","3835271","G006"
"newrowuniquecsv2","none123","X002"
Code
Note that this code assumes there will be a maximum of 2 objects with the same well
property and, if there are 2 objects with the same well
, one of them must have a value not starting with none
.
$mergedCsv = @(
Import-Csv pathtocsv1.csv
Import-Csv pathtocsv2.csv
)
$mergedCsv | Group-Object well | ForEach-Object {
if($_.Count -eq 2) {
return $_.Group.Where{ -not $_.labid.StartsWith('none') }
}
$_.Group
} | Export-Csv pathtomerged.csv -NoTypeInformation
Output
plate labid well
----- ----- ----
1013740016604537004556 3835265 A007
1013740016604537004556 3835269 F006
1013740016604537004556 3835271 G006
newrowuniquecsv1 none123 X001
newrowuniquecsv2 none123 X002
CodePudding user response:
If the lists are large, performance might be an issue as Where-Object
(or any other where
method) and Group-Object
do not perform very well for embedded loops.
By indexing the second csv file (aka creating a hashtable), you have quicker access to the required objects. Indexing upon two (or more) items (plate
and well
) is issued here: Does there exist a designated (sub)index delimiter? and resolved by @mklement0 and zett42 with a nice CaseInsensitiveArrayEqualityComparer
class.
To apply this class on Drew's helpful answer:
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
$csv2.ForEach{ $dict.($_.plate, $_.well) = $_ }
Foreach($line in $csv1) {
If($line.labid -like "none*") {
$line.labid = $dict.($line.plate, $line.well).labid
}
}
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation