How can I extract specific text from PDF?

  • 18 July 2023
  • 5 replies

Badge +1


I’m using the Community Edition of AA and trying to build a bot to extract one small set of digits. When I select the PDF extract text the bot tries to extract all the text on the page. How can I extract only the specific digits I want?


5 replies

Userlevel 6
Badge +15

Hi @john whittaker ,

After the extracting the all the text values, you can use the string package to retrieve required digits only.

Do you want to extract the digits from the PDF (directly). Can you please share screen of the PDF or example PDF file.


Badge +1

Hi, thanks for your response! I attached a screenshot of the upper portion of the PDF which contains an episode number for the TV series Survivor which I copyright. In this case, I’m trying to extract the episode number (4206) and then use that digit to rename the PDF file. I have a lot of certificates, each with a different number so I want to replicate this process over and over.

Userlevel 6
Badge +15

Did you converted the PDF file to Text file ?  Are you able to see the results with the required value ?

Badge +3

1) extract text from pdf with structure. You will be getting text.

2) use before after string operation.




Aravindh s 

Badge +1

I was out of town for several days and just got back. I’ll try converting pdf to text file. Thanks.