Skip to main content
Question

Counting tokens for use in Gen AI Models

  • 3 September 2024
  • 2 replies
  • 47 views

Hello

Can someone please advise the simplest and quickest way to count the number of tokens in text file (words, punctuation, formatting etc) so I can make sure I don’t exceed the token limit for the Gen AI model?

 

I tried using Python but I can’t get the python script to output the token count to a variable. I am using the following script.

 

strInput is my input variable with the words.

strOuput is where i need to store the number of words/tokens found in strInput.

 

# Python code to count tokens using OpenAI's tokenizer
import tiktoken

# Load the appropriate GPT-4 tokenizer
encoding = tiktoken.encoding_for_model("gpt-4")

# Define the text content
text_content = """{{strInput}}"""  # This variable will be populated with content from Automation Anywhere

# Encode the content to count tokens
tokens = encoding.encode(text_content)

# Output the number of tokens
print(len(tokens))

First, make sure that the tiktoken package is already installed, you can do it by running the following command in powershell or cmd:

python -m pip freeze

If you don’t see it in the list of packages displayed after executing that command, you can install it by running the following command:

python -m pip install tiktoken 

If you do not have administrator privileges, you can run the command with the --user modifier like so:

python -m pip install tiktoken --user

 

Secondly, after you’ve made sure that the tiktoken package has been installed correctly, change place your script inside a function, it should look something like this:

import tiktoken

def count_tokens(text_content):
# Load the appropriate GPT-4 tokenizer
encoding = tiktoken.encoding_for_model("gpt-4")

# Encode the content to count tokens
tokens = encoding.encode(text_content)

# Output the number of tokens
return len(tokens)

Finally, change the
🐍Python script: Execute script 
action in your bot to 
🐍Python script: Execute function count_tokens  
and add your variable $strInput$ in the input field.

 

In the end, your bot should look something like this:


You can find the source code of the bot here: 

A360-Python_tiktoken_GPT-4_tokenizer.json - GitHub Gist

You can use this extension to import the source code to a task bot: 
Bot Assistant - Chrome Web Store (google.com)

And here’s the link to get the action package that I used in my bot to install the tiktoken package: 

Run Synchronous Scripts Package - Bot Store (automationanywhere.com)


Thank you so much for this solution, works a treat!


Reply