Wexbide

Wexbide was originally the name given to my Flipper Zero, but has evolved into an AI companion powered by open-source LLMs

Overview

Wexbide is a custom AI implementation powered by the dolphin-mixtral model running on Ollama. This project represents the evolution from a simple device name to a fully functional AI assistant integrated into my personal website.

Background

The name "Wexbide" originated as the identifier for my Flipper Zero device. As my interest in AI and machine learning grew, I decided to bring this identity to life in the form of an interactive AI assistant. This implementation fulfills a long-standing goal of having a custom AI bot integrated directly into my personal website.

Technical Implementation

Core Components

  • Model: dolphin-mixtral (Ollama implementation)
  • Backend: Linux-based server
  • Hardware:
    • Dual GPU setup
    • GPU 1: 4GB VRAM
    • GPU 2: 8GB VRAM
    • Total Available VRAM: 12GB

Development Journey

My introduction to Ollama came through collaboration with my older brother on the Pink Santa App project (which will be available on my website soon). The versatility and ease of use of Ollama quickly made it my go-to solution for various personal projects, including:

  • Discord bots
  • Study tools
  • Website integration

Current Limitations and Challenges

Hardware Constraints

The primary limitation is the available hardware resources:

  • Limited VRAM (12GB total) restricts model size
  • Running a minimized version of dolphin-mixtral
  • Performance impacts due to hardware limitations
  • Slower response times compared to cloud-based solutions

Implementation Compromises

Response Generation

Current implementation requires complete answer generation before sending, leading to:

  • Longer perceived response times
  • Lack of real-time text streaming
  • Different user experience compared to mainstream AI chatbots

Hosting Limitations

The current setup on Vercel's free tier presents several challenges:

  • 10-second communication window limit
  • Unable to implement streaming responses (stream=True)
  • Potential for incomplete answers on longer responses

Future Improvements

Planned Enhancements

  1. Self-Hosting Solution

    • Moving away from Vercel's free tier
    • Implementation of streaming responses
    • Improved response times
  2. Hardware Upgrades

    • Exploring options for increased VRAM
    • Potential cloud computing integration
    • Performance optimization
  3. Model Optimization

    • Testing different model variants
    • Exploring quantization options
    • Balance between performance and resource usage

Alternative Approaches Under Consideration

  • Hybrid hosting solutions
  • Load balancing strategies
  • Caching mechanisms for frequent queries
  • Progressive response delivery methods

Contributing

Contributions and suggestions are welcome! If you have ideas for improvements or want to help develop Wexbide further, please feel free to:

  1. Open an issue in the repository
  2. Submit a pull request
  3. Contact me through my website

Last updated: 11/16/2024