š„Building a Slack Bot with AI Capabilities - Part 6, adding support for DOC/X, XLS/X, PDF, and More to Chat with your Dataš„
aka, please don't make me create a pivot table, I'd rather just ask you
This blog series focuses on presenting complex DevOps projects as simple and approachable via plain language and lots of pictures. You can do it!
This article is part of a series of articles, because 1 article would be absolutely massive.
Part 1: Covers how to build a slack bot in websocket mode and connect to it with python
Part 4: How to convert your local script to an event-driven serverless, cloud-based app in AWS Lambda
Part 6 (this article): Switching to the .converse() API to support DOC/DOCX, XLS/XLSX, PDF, and others to chat with your data
Part 7: Streaming token responses from AWS Bedrock to your AI Slack bot using converse_stream()
Part 8: ReRanking knowledge base responses to improve AI model response efficacy
Part 9: Adding a Lambda Receiver tier to reduce cost and improve Slack response time
Hey all!
So far weāve built a functional chatbot using AWS Lambda and Bedrock, and had it read our entire confluence so it can talk to us about it. Thatās so cool!
But on the user-facing side, we can only talk to it via image or text. We did include Slack āsnippetsā, which are sort of like code blocks, but what if I want the AI to:
Read a PDF contract of mine to summarize what Iām agreeing to?
Look at the sales data spreadsheet from the past quarter and identify if thereās any clients that arenāt spending quite as much as last quarter?
Read my project proposal document and give me some tips?
Read the resume of an upcoming interviewee and let me know some questions to test their knowledge level versus a job description?
The bot doesnāt yet have the ability to read documents, but as you can probably guess from this articleās title and my build-up, weāre going to teach it!!
Hereās an example of analyzing a resume:
Since we can have long conversations with Vera, we can give her a job description and also a candidate resume and get feedback on what we should ask the candidate to talk about their perceived gaps.
Vera can also analyze contracts and explain what youāre agreeing to, which is a really useful talent, particularly for private AI use cases.
With no further ado, lets do it!!
If youād rather skip all this walk-through and read the code yourself, the repo at github/kymidd/SlackAIBotServerless is public and MIT licensed for free use ;)
The .Converse() API
Models can be called directly via Bedrock using the .invoke_model() API. This API directly forwards your payload to the model. This has some great benefits like supporting any cool bespoke thing the model can do, but it has some draw-backs too - each modelās API is a bit different, so if you want to steer your questions to different models, or might one-day switch, it can be a pain to do so.
To combat this, AWS decided to build something better - they built a meta-API called .Converse(). This is an API endpoint that goes directly to Bedrock, and they receive your requests using a standard API structure, and format it for whatever model youāre forwarding to, and then they facilitate the connection through.
That is really powerful in terms of standardization, but weāre looking at it for an entirely different reason.
The .converse() API has support for document reads that Claude Sonnet directly doesnāt!
How does it do that? I have absolutely no idea. But I know that it does, which is super rad.
How Do We .Converse?
There are no permissions changes, since the .invoke() and .converse() APIs use the same IAM permissions - bedrock:InvokeModel.
So we can jump right to code. Lets update our previous code to support this new API.
First step, we need to define a top_k, which we didnāt need to do before.
Top-K limits the next word choices to `n` options (like the next most likely 25, in this example), vs temperature, which is the randomization for which it picks the next token/word.
This is a global setting weāll define right at the top in globals, right here.
top_k = 25 |
The .converse() API uses a standardized format for conversations, which is similar to what the Claude Sonnet model uses in the previously implemented .invoke() implementation, but not quite.
The previous one looks like this:
response = bedrock_client.invoke_model( | |
modelId=model_id, | |
guardrailIdentifier=guardrailIdentifier, | |
guardrailVersion=guardrailVersion, | |
body=json.dumps( | |
{ | |
"anthropic_version": anthropic_version, | |
"max_tokens": 1024, | |
"messages": messages, | |
"temperature": temperature, | |
"system": model_guidance, | |
} | |
), | |
) |
And the new .converse() call looks like this:
response = bedrock_client.converse( | |
modelId=model_id, | |
guardrailConfig={ | |
"guardrailIdentifier": guardrailIdentifier, | |
"guardrailVersion": guardrailVersion, | |
}, | |
messages=messages, | |
system=model_prompt, | |
inferenceConfig=inference_config, | |
additionalModelRequestFields=additional_model_fields | |
) |
We have to prep a few things to support this conversation ^^, namely:
line 2 - the model prompt. This is the āsystem promptā or āsystem instructionsā, which (heavily) guide how the model behaves. We donāt pass it directly as a heredoc string like before, instead we pass it as a conversation-turn-esque structure, so we construct that here.
line 9 - we create the āinference_configā block, which right now just contains the temperature configuration.
On line 14, we create the additional info, the top_k, to pass to the model.
additional_model_fields is the structure thatās passed directly to the model, and isnāt used by the .converse() API directly.
# Format model system prompt for the request | |
model_prompt = [ | |
{ | |
"text": model_guidance | |
} | |
] | |
# Base inference parameters to use. | |
inference_config = { | |
"temperature": temperature | |
} | |
# Additional inference parameters to use. | |
additional_model_fields = { | |
"top_k": top_k | |
} |
We also add some error handling around the request handler. Instead of just doing the request, and writing the fail to the CloudWatch logs, instead we catch the error and return it as the āresponseā. That means that if there is any failure in the request, we send the failure back to the user.
That is extra helpful here, with supporting documents, because not all document types are supported, and all are limited to a particular size, and among 5-10 files of any type are supported in a conversation, etc.
Writing all that error handling myself is bound to be complex and failure-prone, and even if I get it perfect those limitations might change in the future as the model and .converse() improves.
Easiest solution here is to just rely on the .converse() API to do its own error handling, and if it throws an error (like ātoo many files attachedā), and return it to the user since they might be able to change something (like delete an attached file) themselves, without involving me (or you!) to find the error or try and resolve it for them.
# Try to make the request | |
try: | |
response = bedrock_client.converse( | |
# ... excluded call | |
) | |
# Find the response text | |
response = response["output"]["message"]["content"][0]["text"] | |
except Exception as error: | |
# If the request fails, print the error | |
print(f"š Error making request to Bedrock: {error}") | |
# Clean up error message, grab everything after the first : | |
error = str(error).split(":", 1)[1] | |
# Return error as response | |
response = "š Error with request: " + str(error) |
Support Attaching Files
Okay, next up, attaching files. Weāve previously implemented attaching files, but were pretty limited - we were only able to attach a few types of images. But now the world is opened up, and we can attach way more!
So lets go revisit that logic. We have a function called build_conversation_content() function that we call for each slack message in the thread as we iterate over the message. We check for the userās information, if there is any text in the message (unintuitively, slack doesnāt require a text message to be attached for every document file, you can send a document without any text), and also if thereās any files referenced from the message.
Note I said āreferencedā, not attached. Neither the webhook nor the thread message history contains the actual binary of the files attached, just URLs where we can go get them, so we have to crawl over the list of every file attached to every message in the entire thread, find their download URLs, download them, and attach them in the right format. Itās a lot! Letās walk through how.
Lets start in the build_conversation_content() function. Weāre handed off a single āmessageā payload, and we can read it to see if there is a key called āfilesā. If not, we skip all this logic and just move on - itās probably just a text message.
On line 4, we start iterating over the files in the payload, since Slack permits us to attach A LOT of files.
On line 11, we look up the name of the file, and remove anything before the `.` period character that probably falls right before the file extension. Regardless, weāre not permitted to include periods in the file name, so we remove it and just identify the file name that comes before the period.
Weird/fun fact: The Bedrock API requires a file name for each file, and doesnāt permit the file to be attached twice in a single conversation. No clue why.
Next on line 14, we find the āprivate url downloadā link thatās an http web address, and save it.
On line 17, we go get the object using a bearer token that the app uses to authenticate. This stores the binary copy of the file in memory as the āfile_objectā.
On line 22 we extract the binary bytes of the file as file_content from the file_object blob.
# If the payload contains files, iterate through them | |
if "files" in payload: | |
# Append the payload files to the content array | |
for file in payload["files"]: | |
# Debug | |
if os.environ.get("VERA_DEBUG", "False") == "True": | |
print("š File found in payload:", file) | |
# Isolate name of the file and remove characters before the final period | |
file_name = file["name"].split(".")[0] | |
# File is a supported type | |
file_url = file["url_private_download"] | |
# Fetch the file and continue | |
file_object = requests.get( | |
file_url, headers={"Authorization": "Bearer " + token} | |
) | |
# Decode object into binary file | |
file_content = file_object.content |
Next, we check the fileās āmimetypeā. Thatās a header value sometimes called a āmagic byteā that identifies the structure of the file. We use that to check which path we should go down to encode the file.
For images thereās a different way to encode the file for the bedrock conversation than for document type files, so we separate the logic.
On line 10, we identify the file type by reading the mimetype and then saving only everything after the forward slash. So āimage/pngā becomes āpngā.
Slack stores types of files as āimage/pngā, and bedrock expects āpngā, thatās why we do this.
Then on line 13 we append the image to the content block that weāre building.
Note on line 18 weāre literally appending the bytes to the message. Weāre not base64 encoding it or otherwise packaging it up. Literally copy paste into the payload. Cool.
# Check the mime type of the file is a supported image file type | |
if file["mimetype"] in [ | |
"image/png", # png | |
"image/jpeg", # jpeg | |
"image/gif", # gif | |
"image/webp", # webp | |
]: | |
# Isolate the file type based on the mimetype | |
file_type = file["mimetype"].split("/")[1] | |
# Append the file to the content array | |
content.append( | |
{ | |
"image": { | |
"format": file_type, | |
"source": { | |
"bytes": file_content, | |
} | |
} | |
} | |
) |
Then we do the whole thing again, except this time weāre checking if the mimetype of the file is one of the sweet new document types that we support. Youāll see PDFs (line 3), CSV (line 4), word docs (line 5, 6), excel (line 7, 8), html and markdown (line 9, 10).
If it is, we read through the mimetype list and write down the type of file. Most of them follow the same type of āstrip off the application/ā to get the type, but not all - check out āapplication/mswordā mimetype is actually file type ādocxā and āapplication/vnd.ms-excelā is actually āxlsxā.
# Check if file is a supported document type | |
elif file["mimetype"] in [ | |
"application/pdf", | |
"application/csv", | |
"application/msword", | |
"application/vnd.openxmlformats-officedocument.wordprocessingml.document", | |
"application/vnd.ms-excel", | |
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", | |
"text/html", | |
"text/markdown", | |
]: | |
# Isolate the file type based on the mimetype | |
if file["mimetype"] in ["application/pdf"]: | |
file_type = "pdf" | |
elif file["mimetype"] in ["application/csv"]: | |
file_type = "csv" | |
elif file["mimetype"] in ["application/msword", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"]: | |
file_type = "docx" | |
elif file["mimetype"] in ["application/vnd.ms-excel", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"]: | |
file_type = "xlsx" | |
elif file["mimetype"] in ["text/html"]: | |
file_type = "html" | |
elif file["mimetype"] in ["text/markdown"]: | |
file_type = "markdown" |
Then on line 2, we similarly append the file.
On line 15, we append something again. Thatās weird, right? The Bedrock API requires a description of the file. Slack doesnāt always provide one, so what do we do?
Well, I made the super duper clever choice to add a hard-coded text message of āfileā to the conversation whenever a document is attached.
You can see on line 17 I played with adding other smart stuff to the thread, but Iām too worried about muddying the waters for what bedrock should be doing - answering our questions about the document, instead of describing the description weāre providing.
If anyone has a clever way to actually do something with this field, curious what you do there.
# Append the file to the content array | |
content.append( | |
{ | |
"document": { | |
"format": file_type, | |
"name": file_name, | |
"source": { | |
"bytes": file_content, | |
} | |
} | |
} | |
) | |
# Append the required text to the content array | |
content.append( | |
{ | |
#"text": "This file is named " + file_name + " and is a " + file_type + " document.", | |
"text": "file", | |
} | |
) |
Summary
And thatās pretty much it! This took a great deal of trial and error, but in the end only took about 90 minutes to switch from the direct model.invoke() method to the converse() method.
Hereās the PR where I implemented it on the public codebase - itās MIT licensed, go use it! Itās free.
Next up, weāll be streaming tokens back to slack, and adding some immediate feedback to the user that weāre on it. We can even tell some jokes while theyāre waiting.
Thanks all folks. Good luck out there.
kyler