Overview
Wexbide is a custom AI implementation powered by the dolphin-mixtral model running on Ollama. This project represents the evolution from a simple device name to a fully functional AI assistant integrated into my personal website.
Background
The name "Wexbide" originated as the identifier for my Flipper Zero device. As my interest in AI and machine learning grew, I decided to bring this identity to life in the form of an interactive AI assistant. This implementation fulfills a long-standing goal of having a custom AI bot integrated directly into my personal website.
Technical Implementation
Core Components
- Model: dolphin-mixtral (Ollama implementation)
- Backend: Linux-based server
- Hardware:
- Dual GPU setup
- GPU 1: 4GB VRAM
- GPU 2: 8GB VRAM
- Total Available VRAM: 12GB
Development Journey
My introduction to Ollama came through collaboration with my older brother on the Pink Santa App project (which will be available on my website soon). The versatility and ease of use of Ollama quickly made it my go-to solution for various personal projects, including:
- Discord bots
- Study tools
- Website integration
Current Limitations and Challenges
Hardware Constraints
The primary limitation is the available hardware resources:
- Limited VRAM (12GB total) restricts model size
- Running a minimized version of dolphin-mixtral
- Performance impacts due to hardware limitations
- Slower response times compared to cloud-based solutions
Implementation Compromises
Response Generation
Current implementation requires complete answer generation before sending, leading to:
- Longer perceived response times
- Lack of real-time text streaming
- Different user experience compared to mainstream AI chatbots
Hosting Limitations
The current setup on Vercel's free tier presents several challenges:
- 10-second communication window limit
- Unable to implement streaming responses (
stream=True
) - Potential for incomplete answers on longer responses
Future Improvements
Planned Enhancements
-
Self-Hosting Solution
- Moving away from Vercel's free tier
- Implementation of streaming responses
- Improved response times
-
Hardware Upgrades
- Exploring options for increased VRAM
- Potential cloud computing integration
- Performance optimization
-
Model Optimization
- Testing different model variants
- Exploring quantization options
- Balance between performance and resource usage
Alternative Approaches Under Consideration
- Hybrid hosting solutions
- Load balancing strategies
- Caching mechanisms for frequent queries
- Progressive response delivery methods
Contributing
Contributions and suggestions are welcome! If you have ideas for improvements or want to help develop Wexbide further, please feel free to:
- Open an issue in the repository
- Submit a pull request
- Contact me through my website
Last updated: 11/16/2024