I’ve discovered WebLLM recently and it’s impressive, running Llama-3.2-1B-Instruct-q4f16_1-MLC on a school chromebook with 8gb RAM with my custom HTML chat UI, here was my result:

Essentially, instead of running AIs on a backend or server, like with ChatGPT (which costs money), it runs it all on your device, or more accurately, inside your browser. The only downside is that some devices/browsers (like Safari) don’t support WebGPU, which I assume is an API that this library depends on. Another is that initializing these and/or talking to these could be slow or eat up your memory.
Maybe CreatiCode could add this? Here is the link to the NPM library: https://www.npmjs.com/package/@mlc-ai/web-llm
And the list of models is in the prebuiltAppConfig
object of the module, so therefore accessed by logging webllm.prebuiltAppConfig
to the console. I’ve only been able to get it to work in <script> tags with type=module
, so if you guys don’t use ES6 modular code, this might not work. If you need help, ill give it.
This would be game changing, as we wouldn’t have to worry about chatting with it for too long and wasting credits or facing ratelimiting/ errors like “MAX LIMIT REACHED”, and it would all be done for free!