Skip to main content

Hello,

I want to extract relevant data from a Word document, such as invoice number, date, customer name, company name, and total amount, and save it in an Excel sheet.

I searched for packages to extract this data, but the available actions, like replacing text, finding text, and adding paragraphs, don't meet my requirements.

Can anyone guide me on how to achieve this?

Thank you!

Here is how i want my output to be in excel sheet (sample):

 

 

A great way to do that is with Document Automation. Unfortunately, the Community Edition can't do unstructured documents, so you would have to use the full Enterprise Edition. If you're working with invoices, Community Edition may do what you need. Take a look at our Document Automation videos on Pathfinder Academy. 

Another possible way to do that is to use some of the legacy IQ Bot actions, such as Extract text. You might have to export the Word document as a PDF first. Once you do, you can use string manipulation to find and extract what you want from the text, then write the data to an Excel file. This is a lot more work than using Document Automation. 


I tried using Document Automation, but it doesn't accept Word documents (.docx extension).

What should I do if I need to extract data from thousands of Word documents? Do I have to convert all of them to PDFs first to extract the data, or is there another way to do this?

Thank you!


First off, please do not spam our message board with 4 postings on the same topic.

Second, yes, convert them to PDF. There are multiple ways to convert DOCX files to PDF.

https://www.winhelponline.com/blog/how-to-batch-convert-word-documents-into-pdf-files/

Note: We do not endorse any of the applications shown on that site.

Also note that Community Edition is not licensed for commercial use. If you are using it for commercial purposes (e.g., processing thousands of DOCX files), you are in violation of your Community Edition license.


okk but if i purchase the enterprise edition, can i able to extract data from both word document and as well as excel file.
Does document automation accept the docx and csv file format in enterprise edition?
 


No, the Enterprise Edition does not accept DOCX or CSV files either. They would need to be converted to PDF ahead of time.


ok thanks

 


Reply