Today, many individuals prefer to run Large Language Models (LLMs) locally on their computers due to concerns about data security, quick access, or the need to experiment with new open models. We have also executed and compared the Llama 3.2, Phi-3.5, and Gemma 2 models using various tools.

Ollama, Open WebUI, and Docker

To run the models locally, we utilized Ollama, Open WebUI for the interface, and Docker to run Open WebUI. First, you need to install Ollama on your computer to be able to run a Large Language Model locally. After that, you can select and download the model of your choice and start chatting via the terminal. However, if you prefer a more user-friendly environment for the conversation, you can opt for Open WebUI, which is open-source and similar to ChatGPT. After downloading Docker, you can follow the instructions in the documentation or the GitHub page to download the version suitable for your system.

By following this method, we obtained the command that we would run in the terminal. Once you complete these steps, you can check which port Open WebUI is running on within the Docker application and start using Open WebUI by clicking on the connection. After completing all these steps, we began using the open models through Open WebUI.

It’s worth mentioning that some users may choose to use Pinokio instead of Docker during this process. However, when I tried Pinokio, I encountered constant errors while downloading the packages. Some of our team members reported successfully utilizing Pinokio for different products.

Details of Open WebUI and the Models We Used

Returning to Open WebUI; we can say that its simple and familiar interface provides ease of use. You can customize the models with prompts or add multiple models to the interface to test different models simultaneously. It’s also possible to integrate voice support into the platform.

In Open WebUI, we compared the lightweight versions of Meta’s Llama 3.2 model announced last week with Phi-3.5 and Gemma 2 models. Let’s first briefly discuss Llama 3.2. The model includes two visual models with 11 billion and 90 billion parameters, respectively, and two text-focused models designed for mobile devices with lighter specifications of 1 billion and 3 billion parameters.

Llama 3.2 offers a context length of 128,000 tokens, enabling it to process hundreds of pages of text. This feature allows the model to perform more complex tasks.

Meta claims that Llama 3.2 competes with Claude 3 Haiku and GPT4o-mini in understanding both images and text. The model is reported to demonstrate superior performance in areas such as instruction following, summarization, and tool usage. According to shared documents, Llama 3.2 exhibited high performance in various metrics compared to Phi-3.5 and Gemma 2 models.

We compared Llama 3.2 with Phi-3.5 and Gemma using different prompts. Phi-3.5 stood out with its longer and more comprehensive responses, while Gemma differentiated itself with its translation capabilities in Turkish. When we asked each model to write a poem about love between a human and a robot, Llama 3.2’s 1 billion parameter version stated that it could not write a poem on this topic due to Meta’s security policies, while the 3 billion parameter version was able to produce a poem on the same subject. Gemma, on the other hand, did not fail to warn users at the end of the poem it generated.


Source