Skip to main content
Solved

Find and replace highlighted text in Word document

  • 29 August 2024
  • 3 replies
  • 16 views

I would like to find and replace highlighted text in a Word document using AA360. I want it to work for different highlight colours. For example, all text highlighted yellow gets replaced with [REDACTED 1] and all text in red gets replaced with [REDACTED 2]. Could it also work to save two versions? One version where the yellow highlighted text gets replaced with [redacted] and another where it does so for the red highlights only? And also ways I can select more than one colour highlight to replace the text of. I’d also like to be able to easily un-highlight the replaced text if possible.

I understand there are OCR functions built into AA360 but I am not aware of whether they work well. Please could someone suggest a simple workflow for this?

Thank you so much.

Let’s break this into component pieces.

I would like to find and replace highlighted text in a Word document using AA360.

Great! We can do that using Simulate Keystrokes. Sending a CTRL+F to Word raises the Find window or sending a CTRL+H raises the Replace window. Once that window is raised, you can use Recorder: Capture actions to fill in the form.

I want it to work for different highlight colours.

That makes this a lot more difficult. In the Replace window, there is a aMore >>] button. In the Replace section at the bottom, there is a Format drop-down. Under that drop-down is the availability for “Highlight”, but it does not allow you to choose the color.

If Word doesn’t have the functionality to do a find/replace based on color, we can’t either.

Our OCR functionality can read characters off the screen, but cannot identify colors.

The only possibility I can think of that you could even try is at a 10/10 difficulty: DOCX files are really ZIP files. Copy a DOCX file and rename its extension to ZIP. Inside you will find all the component pieces of the Word document in XML format. There is a folder inside called “Word” and inside that is “document.xml”.

By carefully parsing the XML, you could identify highlighted areas and replace the text. The XML format adheres to this standard:

https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats

Note that Word also works with .XML and OpenDOC XML formats if you want to save from there.

Good luck!


Thanks Aaron. That’s a great solution. However, when a document is edited as .XML, saved and opened again, Word detects this and displays a message saying the file was tampered with. This is worrying if you’re sharing the document with another party.

Could you perhaps suggest a way to save all the highlighted text as variables to then use your first suggestion of find and replace?

Perhaps you would need to collate all highlighted text snippets into a list variable? But then I would need to figure out a way of combining text where it is cut off mid-word. Perhaps something to do with <w:r> and similar tags, but it is a 10/10 difficulty as you say.


I noticed these structures when dealing with highlighted text:

<w:r w:rsidRPr="00E00CDB">
<w:rPr>
<w:highlight w:val="yellow"/>
</w:rPr>
<w:t>Yellow highlighted</w:t>
</w:r>

I do see that the val=”yellow” changes as the color of the highlight changes. The <w:t> tags seem to surround the bits of text while <w:r> seems to surround the formatting changes.

In theory, looping through the <w:r> tags, looking for a <w:highlight> tag within should point you toward the highlighted text. As long as your find/replace has the “highlight” option enabled in Format, you may be able to perform those find/replace actions with less (but not zero) risk of replacing the incorrect text.

You should be able to add the text (e.g., “Yellow highlighted” above) to a List variable and then open the original Word document, set up the Find/Replace with Format > Highlight, make your replacements, and be able to save the file without weird messages.


Reply