Skip to main content
Solved

pdf EXRACTION

  • September 20, 2024
  • 2 replies
  • 68 views

Forum|alt.badge.img+3

In AA360, I need to extract particular value from the invoice. 

 

For example ; Invoice No  123345 

But this path (x and y) will get changes for pdf to pdfs. So i cant use extract Field activity. Please suggest any idea 

Best answer by JLogan3o13

Assuming, of course, no access to Doc Automation, you can try the PDF: Extract Text action to extract all data to a text file, and then parse the data.

Alternatively, if you have an account with one of the Gen AI models, you can create an Assistant and then call it from the Generative AI package. I have access to OpenAI, and have done this to upload files through the API and use this to find text: https://platform.openai.com/docs/assistants/tools/file-search

Lastly, I’ve worked with a couple of customers that use python to do the extraction. There are some good scripts out there that use the PDFQuery library in python to convert a PDF to XML and then search the nodes for the data you’re after.

View original
Did this topic help answer your question?

2 replies

Forum|alt.badge.img+17
  • Flight Specialist | Tier 4
  • 87 replies
  • Answer
  • September 20, 2024

Assuming, of course, no access to Doc Automation, you can try the PDF: Extract Text action to extract all data to a text file, and then parse the data.

Alternatively, if you have an account with one of the Gen AI models, you can create an Assistant and then call it from the Generative AI package. I have access to OpenAI, and have done this to upload files through the API and use this to find text: https://platform.openai.com/docs/assistants/tools/file-search

Lastly, I’ve worked with a couple of customers that use python to do the extraction. There are some good scripts out there that use the PDFQuery library in python to convert a PDF to XML and then search the nodes for the data you’re after.


Dineshkumar Muthu
Flight Specialist | Tier 4
Forum|alt.badge.img+9
  • Flight Specialist | Tier 4
  • 87 replies
  • September 22, 2024

Hi @SindhujaaS101835 

 

You can try below options.

  • Digital PDF
    • Option 1: AA Task Bot: PDF package and regular expression to extract the data.
    • Option2: IQ bot/Document Automation.
  • Scanned PDF
    • Option 1: IQ bot/Document Automation.
    • Option 2: AA Task bot: OCR: Capture image by path

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings