Home > Blockchain >  Finding items that exist on two lists
Finding items that exist on two lists

Time:10-28

I have a list which looks like this:

10.0139_ssrn.3771318
10.1001_archdermatol.2012.418
10.1001_archinte.165.15.1737
10.1001_archinte.165.15.1743
10.1001_archinte.165.18.2142
10.1001_archinternmed.2012.127

I have a second list which looks like this:

123 10.0139_ssrn.3771318    
356 10.1001_archdermatol.2012.418
357 10.1001_archinte.165.15.1737    
6   10.1001_archinternmed.2012.127
379 10.1001_archopht.123.1.25   
12  10.1001_archoto.2010.121    
97  10.1001_archotol.127.1.25   

The second list does not contain all items in the first list and vice versa.

I would like to create a file that contains only the matches and would look like this:

123 10.0139_ssrn.3771318    
356 10.1001_archdermatol.2012.418
357 10.1001_archinte.165.15.1737    
6   10.1001_archinternmed.2012.127

I can extract individual lines the way I want with the following command in Powershell:

Get-Content 'Y:\folder\second_list.csv' | foreach {
  $_ -match "10.0139_ssrn.3771318"}| Out-File 'Y:\folder\10.0139_ssrn.3771318'

I do not manage to write a loop that draws the entries from the first file. I tried something like this:

Get-Content 'Y:\folder\second_list.csv' | foreach {
  $line -contains (Get-Content "Y:\folder\first_list.csv")| Out-file "Y:\folder\output.csv" -append}

There are two problems: first, no match is identified (although there should be some matches) and, second, the entry in the output file is always “FALSE” (rather than the matching line of the second_list or no entry at all if no match is found).

CodePudding user response:

I made 2 sample files: test1.csv:

header
"10.0139_ssrn.3771318356"
"10.1001_archdermatol.2012.418"
"10.1001_archinte.165.15.17376"
"10.1001_archinternmed.2012.127"
"10.1001_archopht.123.1.2512"
"10.1001_archoto.2010.12197"
"10.1001_archotol.127.1.25"

test2.csv:

header
"10.0139_ssrn.3771318356"
"10.1001_archdermatol.2012.418"
"10.1001_archinte.165.15.1737"
"10.1001_archinte.165.15.1743"
"10.1001_archinte.165.18.2142"
"10.1001_archinternmed.2012.127"

Then looping over all items in file 1, and checking whether they occur in file 2:

$csv1 = Import-Csv "E:\users\temp\test1.csv"
$csv2 = Import-Csv "E:\users\temp\test2.csv"
$elementsToKeep = @()
foreach ($element1 in $csv1) {
    foreach ($element2 in $csv2) {
        if ($element1.header -eq $element2.header) {
            $elementsToKeep  = $element1
        }
    }
}

$elementsToKeep | Export-Csv "E:\users\temp\output.csv" -NoTypeInformation

Content of output.csv:

"header"
"10.0139_ssrn.3771318356"
"10.1001_archdermatol.2012.418"
"10.1001_archinternmed.2012.127"

CodePudding user response:

Try following :

$filename = "c:\temp\test.csv"
$list1 = @("10.0139_ssrn.3771318", ` 
          "10.1001_archdermatol.2012.418", ` 
          "10.1001_archinte.165.15.1737", `
          "10.1001_archinte.165.15.1743", `
          "10.1001_archinte.165.18.2142", `
          "10.1001_archinternmed.2012.127")
$csv = Import-Csv -Path $filename -Header 'number', 'name'
$csv | Format-Table
$filteredData = $csv.Where({$list1.Contains($_.name)})
$filteredData | Format-Table

Here is output

number name
------ ----
123    10.0139_ssrn.3771318
356    10.1001_archdermatol.2012.418
357    10.1001_archinte.165.15.1737
6      10.1001_archinternmed.2012.127
379    10.1001_archopht.123.1.25
12     10.1001_archoto.2010.121
97     10.1001_archotol.127.1.25



number name
------ ----
123    10.0139_ssrn.3771318
356    10.1001_archdermatol.2012.418
357    10.1001_archinte.165.15.1737
6      10.1001_archinternmed.2012.127
  • Related