Blog

June 2025 Dev Meetup Recap: Vision Models in Document Automation

Forum|Forum|1 year ago
July 9, 2025
0 replies
171 views

+4

Shreya.Kumar

Catch the recording of our June Dev Meetup here.

Previously, on Dev Meetups

Announcements (00:00:40)

While virtual dev meetups took a brief pause over the past couple of months, the Pathfinder Community has been absolutely buzzing with energy on the ground! From our office in Bangalore to the dynamic, vibrant tech scene in Hyderabad: in-person meetups have been lighting up our calendar!

The latest Pathfinder Chapter Meetup in Hyderabad was a standout event, led by our brilliant Community Captain, @vinayreddy_cognitbotz. Hats off to him for keeping the momentum alive, and to our community members for their enthusiastic participation! 🎉

June marked a return to our online collaboration, kicked off by our Community Captain, @Gireesh B P 3262, who co-hosted a live Solution Session with us to build an automation to get structured data from unstructured email conversations. ✨

And on June 19th, we had our virtual Dev Meetup, where we brought together experts to explore the latest in document automation technology.

In other words,…

We’re so Back 🗣️🗣️🗣️🔥🔥🔥

🌟 Technical Demo (00:05:20)

Arjun kicked off the technical deep dive with a comprehensive overview of vision-powered generative AI in document automation. Key highlights included:

Evolution of Document Automation: Now enhanced with vision models that "see" documents, enabling extraction from complex layouts, handwritten text, and unlabeled fields.
Key Capabilities: Extraction of keyless values and signatures.
Advanced table extraction using prompt tags like GenAI Vision, Linking Field, and Table Identifier.
Support for multiple AI engines including GPT-4V, Anthropic Haiku, and a few BYOM (Bring Your Own Model) options.

Arjun showcased real-world examples including:

A product recall form with handwritten and checkbox data.
Complex insurance tables with overlapping fields and multiple identifiers.
Use of test mode for real-time validation and version comparison of learning instances.

💡 Use Case Demo by Community Captain @Inacio Fernandes (00:42:47)

@Inacio, Sr Developer at Tangentia, presented a compelling use case on automating employee expense claims using Document Automation and vision models.

Highlights:

Extraction from printed, handwritten, and even vintage invoices!
Matching extracted data with user-submitted forms.
Automated approval/rejection logic.
Email notifications for audit trails.
Future scope includes fraud detection, ERP integration, and mobile submissions.

🧠 Q&A Highlights (00:56:10)

Here's a curated list of the most relevant questions asked by the community

Will the tags be integrated into the DA UI in a future update?
Yes, the @GenAIVision tag is now selectable directly in the UI. This change is available from version .37 

Is it normal that the request is pending even though there are no fields to validate?
Yes, this can happen in test mode where the system is showing extracted values for review even if no manual validation is required 

Will using test mode consume my DA page count?
Yes, test mode consumes pages just like regular processing 

Will version history be available for previously created learning instances?
Version history is available from .36 onwards. Once test mode is enabled, you can view and compare versions 

Can we see document extraction in languages other than English?
Yes, DA supports multi-language extraction. The supported languages are listed in the documentation: https://docs.automationanywhere.com/bundle/enterprise-v2019/page/languages-support-for-providers-and-third-party-parsers.html

Can DA extract handwritten values from documents?
Yes, DA can extract handwritten data. The best results come from using Google Vision OCR or Microsoft’s standard forms 

Which LLM models are used in DA?
DA uses GPT from OpenAI and Haiku from Anthropic 

Can we use other LLMs from Hugging Face for specific document types?
This is not supported yet, but tooling for this is on the roadmap 

Will user validation improve extraction accuracy?
Yes, if feedback-based improvement is enabled. GenAI and feedback can work together, and the engine will choose the optimal result at runtime 

Will the new version improve accuracy for handwritten documents?
Not yet fully optimized. Vision models help with detection (e.g., signatures), but for extraction, Google Vision OCR is recommended 

Can DA extract field values that span multiple pages?
Yes, using prompt tags like Linking Field and Table Identifier, DA can extract and link multi-page data 

Are models deployed with CR installation in on-prem setups?
DA’s internal models are shipped with CR and optimized to run on bot runner devices. LLMs are cloud-hosted.

Will DA connect to LLMs if only CR URL is whitelisted and proxy is used for internet access?
✅ Yes, if bot runners are configured to access the internet via proxy, DA will connect to LLMs without issue.

🔮 Up Next:

Our next Dev Meetup will be on July 31, 2025. Click here to register.

Want to Present?

Have a use case or topic idea? Email us at community@automationanywhere.com to be featured in future meetups!

This topic has been closed for replies.