Skip to main content
Question

How to extract ( Correct and clean ) text from editable PDF ?

  • January 10, 2022
  • 12 replies
  • 161 views

I am using PDF: Extract Text action to extract text from editable pdf filled by user.

When I use filled pdf to extract text it only extract labels from that pdf not values of that labels.

I used

  • PDF: extract Image
  • OCR: Capture image by path
  • Log to file

But it is not extracting proper text from image.

problems occurs in extract text of label having check boxes as values.

 

what are best way to extract clean and as it is text from editable pdf/ image .

 

Thanks in advance

12 replies

Forum|alt.badge.img+22
  • Most Valuable Pathfinder
  • January 10, 2022

Hi @Rajkumar Jagdale​ ,

 

Use IQBOT for this usecase.


  • Author
  • Flight Specialist | Tier 4
  • January 10, 2022

Enterprise Edition


  • Author
  • Flight Specialist | Tier 4
  • January 10, 2022

OK Sir

@ChanduMohammad S​ and @Tamil Arasu​ 

I will try in that but before that I have to take access for IQ Bot .

 


  • Author
  • Flight Specialist | Tier 4
  • January 10, 2022

45


  • Author
  • Flight Specialist | Tier 4
  • January 10, 2022

Ok


Tamil Arasu10
Most Valuable Pathfinder
Forum|alt.badge.img+18
  • Most Valuable Pathfinder
  • January 10, 2022

Hi @Rajkumar Jagdale​ 

 

As @ChanduMohammad S​ mentioned IQ BOT is the best option here.


Forum|alt.badge.img+22
  • Most Valuable Pathfinder
  • January 10, 2022

Yes you need to get IQBOT Access


Forum|alt.badge.img+22
  • Most Valuable Pathfinder
  • January 10, 2022

Hi @Rajkumar Jagdale​ ,

 

You can request access with your CR Admin and starting exploring.


Tamil Arasu10
Most Valuable Pathfinder
Forum|alt.badge.img+18
  • Most Valuable Pathfinder
  • January 10, 2022

Yes, you need access for IQ BOT.

Are you using Enterprise Edition or Community Editions?


Tamil Arasu10
Most Valuable Pathfinder
Forum|alt.badge.img+18
  • Most Valuable Pathfinder
  • January 10, 2022

Please request your admin to get access to the IQ BOT. By using IQ BOT you can save a lot of time for your use case and more efficiency.

 


Ashwin A.K
Forum|alt.badge.img+10
  • Navigator | Tier 3
  • January 10, 2022

Hi @Rajkumar Jagdale​ 

 

If you use PDF: Extract Text and if the required text is visible in the text file, we can then read the text file and use regex to get what we want.

Please provide some sample data so that we can test that approach, or you can do it from your end.

 

Kind Regards,

Ashwin A.K


  • Author
  • Flight Specialist | Tier 4
  • January 11, 2022

PDF: Extract gives only labels not its values.

Screenshot 2022-01-11 111417Screenshot 2022-01-11 111334