🔥Building a Slack Bot with AI Capabilities - Part 7, Streaming Token Responses from AWS Bedrock to Slack using converse_stream()🔥
aka: AI, what are you doing while I wait? ⏳
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
This article is part of a series of articles, because 1 article would be absolutely massive.
Part 1: Covers how to build a slack bot in websocket mode and connect to it with python
Part 4: How to convert your local script to an event-driven serverless, cloud-based app in AWS Lambda
Part 7 (this article!): Streaming token responses from AWS Bedrock to your AI Slack bot using converse_stream()
Part 8 (coming soon!): ReRanking knowledge base responses to improve AI model response efficacy
Part 9 (coming soon!): Adding a Lambda Receiver tier to reduce cost and improve Slack response time
Hey all!
So far in this series we’ve created a slack bot, connected it to a python-based Lambda, and taught that lambda to walk a whole slack conversation to build a conversation. That conversation is flattened and sent to the knowledge base to get some relevant context from Confluence, and the responses are added as conversation turns. The whole conversation payload is sent to Bedrock to get a response, and then that response is returned to the user in slack as a post.
That’s a ton of things, and sometimes it takes a little bit (5-10 seconds) for that all to happen. In the mean-time, the user is waiting in slack, sure hoping that the bot will eventually respond.
Wouldn’t it be cooler if the bot immediately responded that it got the message, and kept you updated as it cycled through the stages of what it was doing? And then we could even have it stream the tokens from the AI in case you get a long response.
Here’s what it looks like:
When you’re streaming a longer response, the effect is even more pronounced. When we’re not streaming, our Lambda will wait until the AI entirely finishes building the response. The entire time we’re seeing updates flow to the screen would just be a blank, no response time.
I build this for a very simple reason - on complicated questions, I kept thinking that the AI had broken. When it’s streaming the response, I know it’s working through things.
This feature is already implemented in the code-base, and all code is MIT open source so you can implement it yourself, code is here:
With no further ado, lets walk through it.
Keep reading with a 7-day free trial
Subscribe to Let's Do DevOps to keep reading this post and get 7 days of free access to the full post archives.