Find and replace highlighted text in Word document
I would like to find and replace highlighted text in a Word document using AA360. I want it to work for different highlight colours. For example, all text highlighted yellow gets replaced with [REDACTED 1] and all text in red gets replaced with [REDACTED 2]. Could it also work to save two versions? One version where the yellow highlighted text gets replaced with [redacted] and another where it does so for the red highlights only? And also ways I can select more than one colour highlight to replace the text of. I’d also like to be able to easily un-highlight the replaced text if possible.
I understand there are OCR functions built into AA360 but I am not aware of whether they work well. Please could someone suggest a simple workflow for this?
Thank you so much.
Page 1 / 1
Let’s break this into component pieces.
I would like to find and replace highlighted text in a Word document using AA360.
Great! We can do that using Simulate Keystrokes. Sending a CTRL+F to Word raises the Find window or sending a CTRL+H raises the Replace window. Once that window is raised, you can use Recorder: Capture actions to fill in the form.
I want it to work for different highlight colours.
That makes this a lot more difficult. In the Replace window, there is a aMore >>] button. In the Replace section at the bottom, there is a Format drop-down. Under that drop-down is the availability for “Highlight”, but it does not allow you to choose the color.
If Word doesn’t have the functionality to do a find/replace based on color, we can’t either.
Our OCR functionality can read characters off the screen, but cannot identify colors.
The only possibility I can think of that you could even try is at a 10/10 difficulty: DOCX files are really ZIP files. Copy a DOCX file and rename its extension to ZIP. Inside you will find all the component pieces of the Word document in XML format. There is a folder inside called “Word” and inside that is “document.xml”.
By carefully parsing the XML, you could identify highlighted areas and replace the text. The XML format adheres to this standard:
Note that Word also works with .XML and OpenDOC XML formats if you want to save from there.
Good luck!
Thanks Aaron. That’s a great solution. However, when a document is edited as .XML, saved and opened again, Word detects this and displays a message saying the file was tampered with. This is worrying if you’re sharing the document with another party.
Could you perhaps suggest a way to save all the highlighted text as variables to then use your first suggestion of find and replace?
Perhaps you would need to collate all highlighted text snippets into a list variable? But then I would need to figure out a way of combining text where it is cut off mid-word. Perhaps something to do with <w:r> and similar tags, but it is a 10/10 difficulty as you say.
I noticed these structures when dealing with highlighted text:
I do see that the val=”yellow” changes as the color of the highlight changes. The <w:t> tags seem to surround the bits of text while <w:r> seems to surround the formatting changes.
In theory, looping through the <w:r> tags, looking for a <w:highlight> tag within should point you toward the highlighted text. As long as your find/replace has the “highlight” option enabled in Format, you may be able to perform those find/replace actions with less (but not zero) risk of replacing the incorrect text.
You should be able to add the text (e.g., “Yellow highlighted” above) to a List variable and then open the original Word document, set up the Find/Replace with Format > Highlight, make your replacements, and be able to save the file without weird messages.