Local, Open-Source LLMs

  • 14 October 2023
  • 4 replies

Userlevel 3
Badge +9

Has anyone set up an open-source LLM to run locally/on-prem? My company is concerned about data security with cloud-based AI services and foundation models, so I am looking into fine-tuning an open-source model and running it internally.

My vision is start very small (fine tune the model on our employee handbook for a Q&A chat interface). My hope is that this eventually leads to adoption of third-party services for broader application after getting buy-in.


Specifically, I am curious about:

  1. Hardware requirements (NVIDIA vs other/amount of memory needed/etc.)
  2. Time it takes to fine tune
  3. Inference time
  4. Integration with A360


I have heard differing opinions on whether this is a feasible/practical route and would love to hear what you all think.



Best answer by Micah.Smith 18 October 2023, 15:05

View original

4 replies

Userlevel 5
Badge +9

I think it’s pretty doable. Llama 2 specifically would be the first thing I’d explore for this - especially since they’ve basically opened it up for anyone to use even for commercial purposes. They have several models of various sizes which obviously would impact the hardware requirements as well as expected performance.

I haven’t set up Llama to run with bots yet, but I got Open AI’s Whsiper running on a bot runner this week with really good results on transcribing videos to text

Userlevel 5
Badge +9

@LoganPrice - check out LM Studio if you want to try to run some open source models with a relatively low lift. I think its going to get a bit more complicated when you consider fine tuning and model customization, but this would be a great first-try at running some local models (pretty much anything that’s available on Hugging face is fair game)

The requests using LM studio follow the Open AI completion API, so you could easily swap from using the cloud OpenAI API to local without changing your code so much as changing the host url. You would want to make sure you have enough RAM to store the entire model in memory (~3GB in the case of this Llama 2 model I’m running) - but in testing, the chat and API work pretty quickly even running on my M1 Mac Pro.


Userlevel 3
Badge +9

@Micah.Smith This looks fantastic. I’ll give them a go. Did you look into their commercial licensing structure?

Fine tuning is the biggest mystery to me. I’ve found some promising demos that I am still reading up on.



Userlevel 5
Badge +9

Commercial License is free with Llama 2...which is why wallstreet is so confused by Meta, because they’re essentially giving it away