Home > Mobile >  Read list of urls from txt file using powershell and save each url to pdf with name from the last pa
Read list of urls from txt file using powershell and save each url to pdf with name from the last pa

Time:04-21

The following is a powershell code to read urls from txt file and save each url to a pdf. Here each url is saved as number.pdf. I want each pdf to be named with the last part of the url.

for ex: if a url is ' https://www.prodevelopertutorial.com/lte-chapter-1-lte-introduction/ ', I want the saved pdf file to be ' lte-chapter-1-lte-introduction.pdf '

I have obtained the code from a website. Can anybody please modify it as per my requirements.

$sourceFile = "D:\BATCH-PRINT-WEBPAGES-PDF\D\1\links2.txt" # the source file containing the URLs you want to convert
$destFolder = "D:\BATCH-PRINT-WEBPAGES-PDF\sharednotes\" # converted PDFs will be saved here. Folder has to exist.

$num = 0
foreach($link in [System.IO.File]::ReadLines($sourceFile))
{
$num  
$outfile = $num.ToString()   '.pdf'
& 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe' --headless --print-to-pdf="$destFolder $outfile" "$link"
Start-Sleep -s 3
}

From what i was able to gather around the internet, i did the following:

$sourceFile = "D:\BATCH-PRINT-WEBPAGES-PDF\Version 1\linktst.txt" # the source file containing the URLs you want to convert
$destFolder = "D:\BATCH-PRINT-WEBPAGES-PDF\Version 1\OP\" # converted PDFs will be saved here. Folder has to exist.

$links= Get-Content -Path D:\BATCH-PRINT-WEBPAGES-PDF\Version1\linktst.txt

$num = 0
foreach($l in $links)
{
z=[uri]'l'
$nam = z.segment[-2]
$num  
$outfile = $nam.ToString()   '.pdf'
& 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe' --headless --print-to-pdf="$destFolder $outfile" "$link"
Start-Sleep -s 3
}

Its not working.

each entry in the text file is a line.

https://www.prodevelopertutorial.com/lte-chapter-1-lte-introduction/ https://www.prodevelopertutorial.com/lte-network-architecture/ https://www.prodevelopertutorial.com/4g-lte-tutorial-brief-working-of-network-elements-in-lte-architecture/ https://www.prodevelopertutorial.com/introduction-to-e-utran-network-architecture-elements/ https://www.prodevelopertutorial.com/introduction-to-epc-network-architecture-elements/

each url is in a new line in the text.

CodePudding user response:

In your Code you are using $outfile = $nam.ToString() '.pdf'

You declared $nam value as 0 and increasing the number for each loop. where as files are creating with the number.

You can try below. I don't have the chrome.xe in my machine so didn't tested the outfile creation.

$srcfile = "E:\Workspace\Test\Test.txt"
$destloc = "E:\Workspace\Test\Dest\"

$data = Get-Content $srcfile
foreach($url in $data){
    #Write-Output $url

    $url_trim = $url.Trim()
    if($url_trim.EndsWith("/"))
    {
        $url_trim = $url_trim.Substring(0,$url_trim.Length -1 )
    }
    #Write-Host $url_trim -ForegroundColor Cyan
    $filename = $url_trim.Substring($url_trim.LastIndexOf("/") 1)
    #Write-Output $filename

    $outfile = "$filename.pdf" 
    & 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe' --headless --print-to-pdf="$destloc $outfile" "$url"
    Start-Sleep -s 3
    #Write-Output $outfile

}

  • Related