Home > Software design >  Powershell and filenames with non-ASCII characters (e.g. Æ)
Powershell and filenames with non-ASCII characters (e.g. Æ)

Time:04-16

I am attempting to index my movie collection and in doing so have run across an issue where at least one title is skipped in the import phase due to special characters. The code skips over "Æon Flux" due to it starting with Æ. Would anyone know how to correct this, please?

Clear-Host

# Variables:
$movie_dir = "K:\Movies"

# Because reasons...
$PSDefaultParameterValues['*:Encoding'] = 'utf8'

# Connect to the library MySQL.Data.dll
Add-Type -Path 'C:\Program Files (x86)\MySQL\Connector NET 8.0\Assemblies\v4.8\MySql.Data.dll'
 
# Create a MySQL Database connection variable that qualifies:
$Connection = [MySql.Data.MySqlClient.MySqlConnection]@{ConnectionString='server=127.0.0.1;uid=username;pwd=password;database=media'}
$Connection.Open()

# Drop the table to clear all entries.
$sql_drop_table = New-Object MySql.Data.MySqlClient.MySqlCommand
$sql_drop_table.Connection = $Connection
$sql_drop_table.CommandText = 'DROP TABLE Movies'
$sql_drop_table.ExecuteNonQuery() | Out-Null

# (Re)create the table.
$sql_create_table = New-Object MySql.Data.MySqlClient.MySqlCommand
$sql_create_table.Connection = $Connection
$sql_create_table.CommandText = 'create table Movies(movie_id INT NOT NULL AUTO_INCREMENT, movie_title VARCHAR(255) NOT NULL, movie_file_date INT, movie_IMDB_id INT, PRIMARY KEY (movie_id))'
$sql_create_table.ExecuteNonQuery() | Out-Null

$movies = Get-ChildItem $movie_dir -File -include *.mp4 -Recurse -Depth 1 |
    Select-Object -ExpandProperty FullName |
    Sort-Object |
    Get-Unique |
    where{$_ -ne ""}

foreach ($movie in $movies)
{
    # .net function to get just the filename (movie title).
    $title = [System.IO.Path]::GetFileNameWithoutExtension($movie)
    # Get the creation date of the movie and reformat it to yearmonthday.
    $add_date = (Get-ChildItem $movie).CreationTime.toString("yyyyMMdd")

    $query = "INSERT INTO Movies(movie_id, movie_title, movie_file_date) VALUES(NULL, @title, $add_date)"
    $command = $connection.CreateCommand()
    $command.CommandText = $query
    # Sanatize single quotes in filenames for input.
    $command.Parameters.AddWithValue("@title", $title) | Out-Null
    $command.ExecuteNonQuery() | Out-Null
}

# Close the MySQL connection.
$Connection.Close()

Write-Host
Write-Host("Added") $movies.Count ("movies.")

CodePudding user response:

I don't think it is the Get-ChildItem that skips the file with that special character. More likely, you need to tell your MySql to use UTF-8.
For that, have a look at How to make MySQL handle UTF-8 properly

As for your code, I would change this:

$movies = Get-ChildItem $movie_dir -File -include *.mp4 -Recurse -Depth 1 |
    Select-Object -ExpandProperty FullName |
    Sort-Object |
    Get-Unique |
    where{$_ -ne ""}

into

$movies = Get-ChildItem -Path $movie_dir -File -Filter '*.mp4' -Recurse -Depth 1 | Sort-Object -Property FullName

and work with the FileInfo objects from there on:

foreach ($movie in $movies) {
    $title = $movie.BaseName
    # Get the creation date of the movie and reformat it to yearmonthday.
    $add_date = '{0}:yyyyMMdd}' -f $movie.CreationTime
    . . .
}

CodePudding user response:

Though Æ is not an ASCII character it is not otherwise "special", so I edited the question title and tags to reflect that.

ExecuteNonQuery() returns the number of rows affected by the command; in the case of $command, it's the number of rows inserted. You are discarding this value, however...

$command.ExecuteNonQuery() | Out-Null

...which masks the problem in the event the INSERT fails. Instead, test the result and respond appropriately...

if ($command.ExecuteNonQuery() -eq 1)
{
    Write-Host -Message "Successfully inserted movie ""$title""."
}
else
{
    Write-Warning -Message "Failed to insert movie ""$title""."
}

This will make it clear if the issue lies in interacting with the filesystem or the database.

Some other notes:

  • MySqlCommand implements the IDisposable interface and so each instance should be disposed when you're done using it...

    $query = "INSERT INTO Movies(movie_id, movie_title, movie_file_date) VALUES(NULL, @title, $add_date)"
    $command = $connection.CreateCommand()
    try
    {
        $command.CommandText = $query
        # Sanatize single quotes in filenames for input.
        $command.Parameters.AddWithValue("@title", $title) | Out-Null
        if ($command.ExecuteNonQuery() -eq 1)
        {
            Write-Host -Message "Successfully inserted movie ""$title""."
        }
        else
        {
            Write-Warning -Message "Failed to insert movie ""$title""."
        }
    }
    finally
    {
        $command.Dispose()
    }
    

    ...and the same for $sql_drop_table and $sql_create_table. The code in the finally block will run even if an error is thrown from within the try block.

  • See Difference with Parameters.Add and Parameters.AddWithValue and its links for why AddWithValue() can be problematic.

  • Instead of...

    Write-Host("Added") $movies.Count ("movies.")
    

    ...a more typical way to build this message would be with string interpolation...

    Write-Host "Added $($movies.Count) movies."
    

    ...or the format operator...

    Write-Host ('Added {0} movies.' -f $movies.Count)
    

    You can also incorporate numeric format strings, so if $movies.Count is 1234 and $PSCulture is 'en-US' then...

    Write-Host "Added $($movies.Count.ToString('N0')) movies."
    

    ...and...

    Write-Host ('Added {0:N0} movies.' -f $movies.Count)
    

    ...will both write...

    Added 1,234 movies.
    
  • Related