Home > Back-end >  Is there a way for replacing the unspecified text using the command line or simple tools?
Is there a way for replacing the unspecified text using the command line or simple tools?

Time:08-07

I am currently writing a simple script that is helpful for my job. Precisely, it should remove the comment's author in the .docx file. I am doing it manually by adjusting the comments.xml file in the \word folder inside the file. I was thinking a bit about the possible algorithm and came up with the following solution:

  1. Extract comments.xml file from .docx.
  2. Find and replace the text inside comments.xml.
  3. Update comments.xml file with an adjusted version.

Steps 1 and 3 are not too hard, these are simple commands. For example, extraction step can be done like this:

 @ECHO OFF
    SET Winrar=C:\Program Files\WinRAR\WinRAR.exe
    FOR %%I IN (*.docx) DO (
     "%WinRAR%" e "%%I" word\comments.xml
    )

I had a plan to use the Find and Replace (fnr) tool to edit the extracted file.

"C:\Comments\fnr.exe" --cl --dir "" --fileMask "*.xml" --find "" --replace ""

My problem, however, is in the fact that the author's information is different in different files, so I still have to type in the exact text to be replaced. The line within comments.xml in general looks like this:

<w:comment w:id="0" w:author="Author_Name" w:date="YYYY-MM-DDTHH:MM:SS" w:initials="AN">

and what I'm doing manually is changing author="Author_Name" to author="". Is there a way to apply a filter for this edit to be done automatically within the script, please? Thanks in advance!

Edit: So currently I use a temporary solution that replaces w:author with w:noauthor. LibreOffice and Word cannot properly read the information and indicate "No author" under the comment. The information however is still present in the comments.xml file, I will be really grateful for the advice or solution on how to remove it within the same script. It looks like this:

@ECHO OFF
SET Winrar=C:\Program Files\WinRAR\WinRAR.exe
SET fnr=C:\Comments\fnr.exe
FOR %%I IN (*.docx) DO (
 "%WinRAR%" x "%%I" word\comments.xml
 "%fnr%" --cl --dir "C:\Comments\word" --fileMask "*.xml" --find "w:author=" --replace "w:noauthor="
 "%fnr%" --cl --dir "C:\Comments\word" --fileMask "*.xml" --find "w:date=" --replace "w:nodate="
 "%fnr%" --cl --dir "C:\Comments\word" --fileMask "*.xml" --find "w:initials=" --replace "w:noinitials="
 "%WinRAR%" u "%%I" word\
 del C:\Comments\word\ /q
)

CodePudding user response:

The final solution looks like this:

   @ECHO OFF
SET Winrar=C:\Program Files\WinRAR\WinRAR.exe
SET fnr=C:\Comments\fnr.exe
FOR %%I IN (*.docx) DO (
 "%WinRAR%" x "%%I" word\comments.xml
 "%fnr%" --cl --dir "C:\Comments\word" --fileMask "*.xml" --useRegex --find "w:author=\". ?\"" --replace "w:author=\"\"
 "%fnr%" --cl --dir "C:\Comments\word" --fileMask "*.xml" --useRegex --find "w:date=\". ?\"" --replace "w:date=\"\"
 "%fnr%" --cl --dir "C:\Comments\word" --fileMask "*.xml" --useRegex --find "w:initials=\". ?\"" --replace "w:initials=\"\"
 "%WinRAR%" u "%%I" word\
 del C:\Comments\word\ /q
)
  • Related