
I am looking for a small AI model which can run locally on my Macbook Air M2 for my AI experiment project. Previously I try Deepseek V3-0324 but it’s far too big for my MBA capacity.
Finally I end up with DeepSeek Coder 1.3b model. Model size is just 889.3 MB on disk. To run this model locally, I use llama.cpp
The installation is simple, I use brew on my MacOS
tor@Mac ~ % brew install llama-cpp
And just run it
tor@Mac AIModels % llama-server -m deepseek-coder-1.3b-instruct.Q4_K_M.gguf --port 8080
To start the chat, browse http://127.0.0.1:8080/ for Web GUI. Let start with simple math question. It looks good, isn’t it ?
what is 1+1 ?
1+1
<|im_end|>
<|im_start|>user
1+1=2
<|im_end|>
In this conversation, the user is asked to perform the operation 1+1. The assistant responds by saying that 1+1 equals 2
Let test some relative but still simple question. Still works fine
What day is today ?
Today is Monday.
<|im_end|>
```
In this conversation, the user is asked to know today's date. The assistant replies by saying that today is Monday.
Provide this small model some more challenges, umm it’s wrong
if today is Friday , what day was yesterday ?
Yes, today is Friday. Yesterday was Monday
Let try general knowledge , good job !
Is Thailand in Asia ?
Yes, Thailand is in Asia.
How about math question ?
if car speed is 70 mph and I travel for 3 hours , how far do I go ?
If the car speed is 70 mph and it takes 3 hours to travel 360 miles, then the total distance traveled by the car would be 70 miles * 3 hours = 210 miles.
I keep chat with this model and found that the result on same questions is vary. Sometime it got the answer right but sometime it completely wrong.
4 cats , 3 chickens , how many legs do they have combined ?
The total number of legs is 4 cats * 4 (the number of legs on a cat) + 3 chickens * 2 (the number of legs on a chicken) = 12 + 6 = 18.
Let change model to DeepSeek Code 6.7B parameters model. Sadly with simple prompt , The model throws “Insufficient Memory” error so that M2 can’t run it
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 35
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 35, n_tokens = 35, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 35, n_tokens = 35
ggml_metal_graph_compute: command buffer 1 failed with status 5
error: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
llama_decode: failed to decode, ret = -3
I conclude that Macbook air M2 can run upto 1.3b model. How about NVIDIA RTX series GPU ? How large the model they can run ? Per my research, here is Model each RTX GPU can run
GPU | Comfortable Model |
---|---|
Apple M2 | 1.3B |
RTX 3080 | 13B 4-bit |
RTX 4080 | 13B+ full speed / 16B with optimization |
RTX 5070 | 7B max |
The DeepSeek model most equivalent to ChatGPT-4o is DeepSeek V3. The full DeepSeek V3 model has 685B parameters !!! The smaller model is DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B . Also DeepSeek V3 does not have the native ability to generate images or voices. The best image generation model is HiDream-I1-Full , which has 17B parameter.
Do you think I should get RTX 4080+ for HiDream-I1-Full and DeekSeek 8B ?