Skip to main content
Solved

Find and replace highlighted text in Word document

  • August 29, 2024
  • 3 replies
  • 101 views

Forum|alt.badge.img+4

I would like to find and replace highlighted text in a Word document using AA360. I want it to work for different highlight colours. For example, all text highlighted yellow gets replaced with [REDACTED 1] and all text in red gets replaced with [REDACTED 2]. Could it also work to save two versions? One version where the yellow highlighted text gets replaced with [REDACTED] and another where it does so for the red highlights only? And also ways I can select more than one colour highlight to replace the text of. I’d also like to be able to easily un-highlight the replaced text if possible.

I understand there are OCR functions built into AA360 but I am not aware of whether they work well. Please could someone suggest a simple workflow for this?

Thank you so much.

Best answer by Aaron.Gleason

Let’s break this into component pieces.

I would like to find and replace highlighted text in a Word document using AA360.

Great! We can do that using Simulate Keystrokes. Sending a CTRL+F to Word raises the Find window or sending a CTRL+H raises the Replace window. Once that window is raised, you can use Recorder: Capture actions to fill in the form.

I want it to work for different highlight colours.

That makes this a lot more difficult. In the Replace window, there is a [More >>] button. In the Replace section at the bottom, there is a Format drop-down. Under that drop-down is the availability for “Highlight”, but it does not allow you to choose the color.

If Word doesn’t have the functionality to do a find/replace based on color, we can’t either.

Our OCR functionality can read characters off the screen, but cannot identify colors.

The only possibility I can think of that you could even try is at a 10/10 difficulty: DOCX files are really ZIP files. Copy a DOCX file and rename its extension to ZIP. Inside you will find all the component pieces of the Word document in XML format. There is a folder inside called “Word” and inside that is “document.xml”.

By carefully parsing the XML, you could identify highlighted areas and replace the text. The XML format adheres to this standard:

https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats

Note that Word also works with .XML and OpenDOC XML formats if you want to save from there.

Good luck!

View original
Did this topic help answer your question?

3 replies

Aaron.Gleason
Automation Anywhere Team
Forum|alt.badge.img+10
  • Automation Anywhere Team
  • 541 replies
  • Answer
  • August 29, 2024

Let’s break this into component pieces.

I would like to find and replace highlighted text in a Word document using AA360.

Great! We can do that using Simulate Keystrokes. Sending a CTRL+F to Word raises the Find window or sending a CTRL+H raises the Replace window. Once that window is raised, you can use Recorder: Capture actions to fill in the form.

I want it to work for different highlight colours.

That makes this a lot more difficult. In the Replace window, there is a [More >>] button. In the Replace section at the bottom, there is a Format drop-down. Under that drop-down is the availability for “Highlight”, but it does not allow you to choose the color.

If Word doesn’t have the functionality to do a find/replace based on color, we can’t either.

Our OCR functionality can read characters off the screen, but cannot identify colors.

The only possibility I can think of that you could even try is at a 10/10 difficulty: DOCX files are really ZIP files. Copy a DOCX file and rename its extension to ZIP. Inside you will find all the component pieces of the Word document in XML format. There is a folder inside called “Word” and inside that is “document.xml”.

By carefully parsing the XML, you could identify highlighted areas and replace the text. The XML format adheres to this standard:

https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats

Note that Word also works with .XML and OpenDOC XML formats if you want to save from there.

Good luck!


Forum|alt.badge.img+4

Thanks Aaron. That’s a great solution. However, when a document is edited as .XML, saved and opened again, Word detects this and displays a message saying the file was tampered with. This is worrying if you’re sharing the document with another party.

Could you perhaps suggest a way to save all the highlighted text as variables to then use your first suggestion of find and replace?

Perhaps you would need to collate all highlighted text snippets into a list variable? But then I would need to figure out a way of combining text where it is cut off mid-word. Perhaps something to do with <w:r> and similar tags, but it is a 10/10 difficulty as you say.


Aaron.Gleason
Automation Anywhere Team
Forum|alt.badge.img+10
  • Automation Anywhere Team
  • 541 replies
  • August 29, 2024

I noticed these structures when dealing with highlighted text:

<w:r w:rsidRPr="00E00CDB">
    <w:rPr>
        <w:highlight w:val="yellow"/>
    </w:rPr>
    <w:t>Yellow highlighted</w:t>
</w:r>

I do see that the val=”yellow” changes as the color of the highlight changes. The <w:t> tags seem to surround the bits of text while <w:r> seems to surround the formatting changes.

In theory, looping through the <w:r> tags, looking for a <w:highlight> tag within should point you toward the highlighted text. As long as your find/replace has the “highlight” option enabled in Format, you may be able to perform those find/replace actions with less (but not zero) risk of replacing the incorrect text.

You should be able to add the text (e.g., “Yellow highlighted” above) to a List variable and then open the original Word document, set up the Find/Replace with Format > Highlight, make your replacements, and be able to save the file without weird messages.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings