Technical details behind deploying LLM’s
Will be an ongoing thread of cool ideas I come across.
- Using fly.io to distribute inference in a geographically optimized fashion.
- Having multiple backends for chatbots/LLM’s (i.e. openai –> cohere pipeline) if one fails/server is down/request doesn’t work.
- Supabase and vector embeddings