Question

How to extract ( Correct and clean ) text from editable PDF ?


I am using PDF: Extract Text action to extract text from editable pdf filled by user.

When I use filled pdf to extract text it only extract labels from that pdf not values of that labels.

I used

  • PDF: extract Image
  • OCR: Capture image by path
  • Log to file

But it is not extracting proper text from image.

problems occurs in extract text of label having check boxes as values.

 

what are best way to extract clean and as it is text from editable pdf/ image .

 

Thanks in advance


12 replies

Userlevel 5
Badge +9

Hi @Rajkumar Jagdale​ ,

 

Use IQBOT for this usecase.

Enterprise Edition

OK Sir

@ChanduMohammad S​ and @Tamil Arasu​ 

I will try in that but before that I have to take access for IQ Bot .

 

45

Ok

Userlevel 3
Badge +7

Hi @Rajkumar Jagdale​ 

 

As @ChanduMohammad S​ mentioned IQ BOT is the best option here.

Userlevel 5
Badge +9

Yes you need to get IQBOT Access

Userlevel 5
Badge +9

Hi @Rajkumar Jagdale​ ,

 

You can request access with your CR Admin and starting exploring.

Userlevel 3
Badge +7

Yes, you need access for IQ BOT.

Are you using Enterprise Edition or Community Editions?

Userlevel 3
Badge +7

Please request your admin to get access to the IQ BOT. By using IQ BOT you can save a lot of time for your use case and more efficiency.

 

Userlevel 3
Badge +7

Hi @Rajkumar Jagdale​ 

 

If you use PDF: Extract Text and if the required text is visible in the text file, we can then read the text file and use regex to get what we want.

Please provide some sample data so that we can test that approach, or you can do it from your end.

 

Kind Regards,

Ashwin A.K

PDF: Extract gives only labels not its values.

Screenshot 2022-01-11 111417Screenshot 2022-01-11 111334

Reply