Skip to main content

Hi, help me! Can I extract only alphabet value from string and exclude those number value in string.

Example string: My name is Liyana 6547

I just want to display only “My name is Liyana”. The string value is random and not fix.

Hey @liyananadia ,

you can use String→ Replace and pass RegEx as \d+, and replace with keep as blank. You will get the output


Hi @Zaid Chougle. Thank you so much for your answers. It’s work now! 

 


hi ​@liyananadia ​@Zaid Chougle ,

I have an OCR output string containing various data, and I need to extract two specific pieces of information:

  1. The PAN ID, which follows the pattern of five letters, four digits, and one letter (e.g., ABCDE1234F).
  2. The last four digits of an Aadhar card number. The format appears as xxxx1234 among all the data in same ocr string  but I only need the numeric portion (1234).

     

Hi Team,

 

I have an OCR output string containing various data, and I need to extract two specific pieces of information:

  1. The PAN ID, which follows the pattern of five letters, four digits, and one letter (e.g., ABCDE1234F).
  2. The last four digits of an Aadhar card number. The format appears as xxxx1234 among all the data in same ocr string  but I only need the numeric portion (1234).

    below is Approached methods to extract the desired string using regex under string 

    1. for PAN card ^ A-Z]{2}[A-Z]{3}[0-9]{4}[A-Z]{1}$$ this as  regx 
    2. for Aadhar last 4 digits are i^xxxx]r0-9]{4}

    but unable to generate the specified string instead it generates whole string value

@Vaandu Could you share an example/dummy of what your complete output looks like? We’ll be able to help you better based on that.

Depending on any recurring patterns in the output format, you could use a combination of String Extract and Substring functions to get the data you need. For example, to extract the Aadhar info, you could look for a keyword and use String Extract to isolate the complete Aadhar number in a string variable. Then you can use Substring action and specify:

  • the starting position of the substring (length of string - 3)
  • the length of the substring (4).

 


Hi ​@Shreya.Kumar ,

Thanks for your reply,

the String value is  “ Pudukkottal.Tamil Nadu,ca,622202 215 KEELA STREET, xxxxx2606 THURAIYUR PANCHAYAT ARIMALAM PANCHAYAT KEEZHAA NILAI POST.Thuralyur. Pudukkottal Tama Nadu,India.622202  NA ABCDE1234F VIVEKANANTHAN M MURUGESAN 26-04-1992 ABCDE1234F VIVEKANANTHAN MURUGESAN NA NA VALID ABCDE1234F 80.00% 0.00% 100% 100% 89.00% 8900% 100% “

i need to retrive “ABCDE1234F (PAN Id) ” and xxxxx2606 (Aadhar ID last four digits ).

What is the feasible solution for retrieve above data


@Vaandu Seeing as the output data was unstructured, I tried an approach using Regex Tools package, in addition to my earlier suggestion

I also read in the topic you started (now moved to this thread) that you tried using Regex pattern strings, I tried using those but I didn’t get an answer, so instead I simplified the pattern strings:
 

  • For PAN: AA-Z]{5}d0-9]{4}{A-Z]
  • For Aadhaar: (\D{5})(\d{4})

That worked for me using the Extract a Regex Match action. I stored the extracted strings in different variables. To get the last 4 digits of the Aadhaar number, I used the Substring action that I mentioned in my earlier answer.

 

Hope this helps!


Hi ​@Shreya.Kumar ,

I'm glad the solution worked for  PAN ID! 🎉thanks for the innovative solution.

For Aadhar ID, the

ex isn't working because the last number may contain "xxxx2606" (4 x's) or "xxxxx2606" (5 x's). 
for the current solution it retrieves the “VKYC    2448” for aadhar , 

so can i change the regex as “gx]{4,5}(\d{4})” for aadhar

 


@Vaandu could you check using message boxes, exactly which step is not working? Message Boxes work similar to print statements, so you should be able to check the value of extracted string after each action


 

hi ​@Shreya.Kumar 

I tried with another set of input and tested it in the message box action. While it worked correctly with the PAN ID regex configuration, it didn't work as expected with the Aadhar ID. According to your regex (\D{5})(\d{4}), it retrieves 5 non-numeric characters followed by 4 numeric digits. For example, the result is shown as "VKYCM3345" in the message box, but it should have retrieved "xxxx2606" instead. As per the regex, it generated right value only,

so i was tried with (x{4,5})\d{4} , so “x” could be constant and 4 numeric value , but it was not worked out 

 

 


@Vaandu I found a workaround to this - I tried your regex expression with “Extract All Matches” instead of “Extract a Regex Match” and it worked.

This approach gives you all matching strings in a list and you could loop through them to check the correct one for your application. Hope this helps!

 


@Shreya.Kumar ,

Thank you for the insightful idea on extracting patterns!
 

I had been achieving a 100% success rate using another method, which involved applying the same regex through a VBS script. This approach allowed me to obtain highly accurate results without the need for any additional action packages. The extracted string data was then stored in an Excel file.

Your idea provided a feasible solution that significantly enhanced my process. Below is a breakdown of my VBS script and the string output:

additional thanks to ​@Tamil Arasu10 

 


Reply