Technical details behind deploying LLM’s

Will be an ongoing thread of cool ideas I come across.

  • Using fly.io to distribute inference in a geographically optimized fashion.
  • Having multiple backends for chatbots/LLM’s (i.e. openai –> cohere pipeline) if one fails/server is down/request doesn’t work.
  • Supabase and vector embeddings