Home > OS >  Extract user name and email from text file using powershell
Extract user name and email from text file using powershell

Time:05-20

Trying to retrieve new user information from a text file to create a domain user with the first name and last name NOT middle name, and email address from a text file. The file is actually an .eml file so there is more to this text file, but this is just the bottom of file but its format is always the same. I listed a sample below. I need to extract these items to variables $Fname, $Lname and $Eaddr. What I need to understand is how to first search for a specific line, in this case "BILLING ADDRESS" and then grab the line 2 lines down and put the first name and last name in the above variables. Email address is the same situation but keying on "Congratulations on the sale." and moving up. Can't just count from "BILLING ADDRESS" because there could be an additional address line for say, apt or suite. Also there could be a middle name in the name line so the script needs to work around that possibility like the 2nd address line. Below is a sample of the text. It's whats at the bottom of the files and it's always the same format.

----------------------------------------


BILLING ADDRESS

Joe Some Blow
123 Nowhere
Someplace, TX 75075
[email protected]


----------------------------------------

Congratulations on the sale.

----------------------------------------

$path = "C:\Program Files (x86)\hMailServer\Data\theserver.com\autobot\B0"


$GETemail = (Select-String -Path "$path\*.eml" -Pattern '(^\W*.*@.*\.{1,}\w*$)' | Select-Object -ExpandProperty Line)

select-string -pattern "@" -InputObject $PREaddr -raw


$PREaddr = (Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 7) | Select-Object -Skip 3

$GETemail = Select-String -Pattern '(^\W*.*@.*\.{1,}\w*$)' | Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 7

#Using Regex to pull email addresses
$file = Get-Content "location of file"
(Select-String -InputObject $file -Pattern '\w @\w \.\w ' -AllMatches).Matches | select value



$GETemail = (Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 7) | Select-String -Pattern '\w @\w \.\w '



$file = Get-Content "C:\Program Files (x86)\hMailServer\Data\theserver.com\autobot\B0\*.eml"
(Select-String -InputObject $file -Pattern '\w @\w \.\w ' -AllMatches).Matches | select value

$GETemail = Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 7 | Select-String -Pattern '(^\W*.*@.*\.{1,}\w*$)'



Get-Item -Path "$path\*.eml" | Get-Content -Tail -2


Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 2 | select-object Line | ft -HideTableHeaders
Select-String -Path "$path\*.eml" -Pattern 'Congratulations' -CaseSensitive -Context 5, 0 | select-object Line | ft -HideTableHeaders


Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 2 | select-object -Skip 1


CodePudding user response:

Ok, I'll break down my comment. Code in my comment:

gci $path\*.eml|%{gc $_ -raw|?{$_ -match '(?ms)BILLING ADDRESS\s (\S. ?)[\r\n]. ?[\r\n](\S @\S )'}|%{[pscustomobject]@{FirstName=$Matches[1].split(' ')[0];LastName=$Matches[1].Split(' ')[-1];Email=$Matches[2]}}

I'll start with defining the various aliases that I used to keep it short in the comment:

gci -> Get-ChildItem
%   -> ForEach-Object
gc  -> Get-Content
?   -> Where-Object

Formatted a little more nicely and not using the aliases it would look like this:

Get-ChildItem $path\*.eml|
    ForEach-Object{
        Get-Content $_ -raw |
            Where-Object{$_ -match '(?ms)BILLING ADDRESS\s (\S. ?)[\r\n]. ?[\r\n](\S @\S )'}|
            ForEach-Object{
                [pscustomobject]@{
                    FirstName=$Matches[1].split(' ')[0];
                    LastName=$Matches[1].Split(' ')[-1];
                    Email=$Matches[2]
                }
            }
    }

This starts out with Get-ChildItem, and that's just searching for *.eml at the path defined in $path. Nothing super complex there, moving on.

Next we hit a ForEach-Object loop. I actually run two loops here, one nested inside the other, so for the outer loop we are concerned with processing files one at a time as they're found. So, for each file the first thing it does is:

Get-Content $_ -raw

That command gets the content of the file that was passed to it as a multi-line string. This allows us to do a single search matching multiple groups against the entire email at once, which is what we do in the next part:

Where-Object{$_ -match '(?ms)BILLING ADDRESS\s (\S. ?)[\r\n]. ?[\r\n](\S @\S )'}

This says we only want emails that match the specified RegEx pattern. I'll let you see how RegEx 101 breaks it down if you need the RegEx (Regular Expression) explained. The match has two capturing groups in it, and those get populated into the automatic $Matches variable for each iteration of a ForEach-Object loop that the results are passed to. The way that works is that it populates $Matches with an array, where the entire string matched is the first item, then each capturing group is an additional item in the array. In our case with your given example that would be:

$Matches[0]
BILLING ADDRESS

Joe Some Blow
123 Nowhere
Someplace, TX 75075
[email protected]

$Matches[1]
Joe Some Blow

$Matches[2]
[email protected]

Then I just loop through the results of that to utilize the $Matches results, and build an object for each result.

ForEach-Object{
    [pscustomobject]@{
        FirstName=$Matches[1].split(' ')[0];
        LastName=$Matches[1].Split(' ')[-1];
        Email=$Matches[2]
    }
}

In that I use the first capture group (Joe Some Blow), split it on the spaces with .split(' '), and use the first result of the split for the first name, and the last result for the last name. I grab the second capturing group for the email address. Then it's just a last } (which is missing in my comment) to close the outer ForEach-Object loop.

  • Related