Overview

Wexbide is a custom AI implementation powered by the dolphin-mixtral model running on Ollama. This project represents the evolution from a simple device name to a fully functional AI assistant integrated into my personal website.

Background

The name "Wexbide" originated as the identifier for my Flipper Zero device. As my interest in AI and machine learning grew, I decided to bring this identity to life in the form of an interactive AI assistant. This implementation fulfills a long-standing goal of having a custom AI bot integrated directly into my personal website.

Technical Implementation

Core Components

Model: dolphin-mixtral (Ollama implementation)
Backend: Linux-based server
Hardware:
- Dual GPU setup
- GPU 1: 4GB VRAM
- GPU 2: 8GB VRAM
- Total Available VRAM: 12GB

Development Journey

My introduction to Ollama came through collaboration with my older brother on the Pink Santa App project (which will be available on my website soon). The versatility and ease of use of Ollama quickly made it my go-to solution for various personal projects, including:

Discord bots
Study tools
Website integration

Current Limitations and Challenges

Hardware Constraints

The primary limitation is the available hardware resources:

Limited VRAM (12GB total) restricts model size
Running a minimized version of dolphin-mixtral
Performance impacts due to hardware limitations
Slower response times compared to cloud-based solutions

Implementation Compromises

Response Generation

Current implementation requires complete answer generation before sending, leading to:

Longer perceived response times
Lack of real-time text streaming
Different user experience compared to mainstream AI chatbots

Hosting Limitations

The current setup on Vercel's free tier presents several challenges:

10-second communication window limit
Unable to implement streaming responses (stream=True)
Potential for incomplete answers on longer responses

Future Improvements

Planned Enhancements

Self-Hosting Solution
- Moving away from Vercel's free tier
- Implementation of streaming responses
- Improved response times
Hardware Upgrades
- Exploring options for increased VRAM
- Potential cloud computing integration
- Performance optimization
Model Optimization
- Testing different model variants
- Exploring quantization options
- Balance between performance and resource usage

Alternative Approaches Under Consideration

Hybrid hosting solutions
Load balancing strategies
Caching mechanisms for frequent queries
Progressive response delivery methods

Contributing

Contributions and suggestions are welcome! If you have ideas for improvements or want to help develop Wexbide further, please feel free to:

Open an issue in the repository
Submit a pull request
Contact me through my website

Last updated: 11/16/2024