Skip to main content
Solved

Local, Open-Source LLMs


LoganPrice
Most Valuable Pathfinder
Forum|alt.badge.img+12

Has anyone set up an open-source LLM to run locally/on-prem? My company is concerned about data security with cloud-based AI services and foundation models, so I am looking into fine-tuning an open-source model and running it internally.

My vision is start very small (fine tune the model on our employee handbook for a Q&A chat interface). My hope is that this eventually leads to adoption of third-party services for broader application after getting buy-in.

 

Specifically, I am curious about:

  1. Hardware requirements (NVIDIA vs other/amount of memory needed/etc.)
  2. Time it takes to fine tune
  3. Inference time
  4. Integration with A360

 

I have heard differing opinions on whether this is a feasible/practical route and would love to hear what you all think.

 

Best answer by Micah.Smith

@LoganPrice - check out LM Studio if you want to try to run some open source models with a relatively low lift. I think its going to get a bit more complicated when you consider fine tuning and model customization, but this would be a great first-try at running some local models (pretty much anything that’s available on Hugging face is fair game)

The requests using LM studio follow the Open AI completion API, so you could easily swap from using the cloud OpenAI API to local without changing your code so much as changing the host url. You would want to make sure you have enough RAM to store the entire model in memory (~3GB in the case of this Llama 2 model I’m running) - but in testing, the chat and API work pretty quickly even running on my M1 Mac Pro.

 

 
View original
Did this topic help answer your question?

4 replies

Micah.Smith
Automation Anywhere Team
Forum|alt.badge.img+12
  • Automation Anywhere Team
  • 427 replies
  • October 14, 2023

I think it’s pretty doable. Llama 2 specifically would be the first thing I’d explore for this - especially since they’ve basically opened it up for anyone to use even for commercial purposes. They have several models of various sizes which obviously would impact the hardware requirements as well as expected performance.

I haven’t set up Llama to run with bots yet, but I got Open AI’s Whsiper running on a bot runner this week with really good results on transcribing videos to text


Micah.Smith
Automation Anywhere Team
Forum|alt.badge.img+12
  • Automation Anywhere Team
  • 427 replies
  • Answer
  • October 18, 2023

@LoganPrice - check out LM Studio if you want to try to run some open source models with a relatively low lift. I think its going to get a bit more complicated when you consider fine tuning and model customization, but this would be a great first-try at running some local models (pretty much anything that’s available on Hugging face is fair game)

The requests using LM studio follow the Open AI completion API, so you could easily swap from using the cloud OpenAI API to local without changing your code so much as changing the host url. You would want to make sure you have enough RAM to store the entire model in memory (~3GB in the case of this Llama 2 model I’m running) - but in testing, the chat and API work pretty quickly even running on my M1 Mac Pro.

 

 

LoganPrice
Most Valuable Pathfinder
Forum|alt.badge.img+12
  • Author
  • Most Valuable Pathfinder
  • 77 replies
  • October 18, 2023

@Micah.Smith This looks fantastic. I’ll give them a go. Did you look into their commercial licensing structure?

Fine tuning is the biggest mystery to me. I’ve found some promising demos that I am still reading up on.

 

 


Micah.Smith
Automation Anywhere Team
Forum|alt.badge.img+12
  • Automation Anywhere Team
  • 427 replies
  • October 18, 2023

Commercial License is free with Llama 2...which is why wallstreet is so confused by Meta, because they’re essentially giving it away 

 

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings