Video: StreamableHTTP in depth

1. Video: StreamableHTTP in depth

Let's do a quick recap of the last couple of videos just to make sure we are on the same page. The first thing we understood was that some MCP functionality, namely things like sampling and notifications and logging, relies upon the server making a request to the client. Shortly after that, we learned that when we are making use of HTTP in general, it's kind of hard to allow a server to make a request to a client. So these first two points are kind of butting heads. They are at odds. We need a server to be able to make a request to a client to get full functionality, but at the same time, that is hard to do when using HTTP. Now as I mentioned, streamable HTTP has a workaround to fix this, and I'm going to show you exactly what that workaround is and how the transport works in general in this video. So we're going to finally get going to get clarity on exactly what's going on. Lastly, keep in mind that in some scenarios, we are going to set those two flags to true and we'll discuss why we'd ever set them to true. you do so, that breaks the workaround. And that's kind of why streamable HTTP as a transport is a little bit tough to wrap your head around, because we've got this limitation around HTTP, we've got the workaround, but in some cases we want to not use that workaround per se. That's why it's a tricky subject. Well, without further ado, let's get to it. Let's understand how this transport actually works. I'm going to begin by showing you a couple of different diagrams to help you understand the behind the scenes process. And I'm going to show you a short demonstration where you're going to see all these requests actually made and the entire system come together in a very small demo application. So here we go. First thing I want you to recall is what happens when a client initially connects to a server. According to the MCP spec, the client must send a initialized request to the server. Server replies with a result and then the client has to make a follow-up request with the initialized notification. At that point in time, the server then considers the client to be connected and everything is good to go. Now as soon as we start to use the HTTP transport, this flow changes ever so slightly. So as soon as you start using this transport, the initialized result that gets sent back from the server is going to have a header inside the HTTP response. This header is the MCP session ID. This is a random string of numbers and letters that gets assigned to our client. It's essentially an identifier for our connection to the server. As soon as we receive that header, we are required to include it in all follow-up requests we make to the server. This allows the server to identify our client. Now after we have done this initialization and gotten that session ID, here's where the magic part begins. So this is kind of the big workaround that allows the server to make requests to the client. As soon as we finish that initialization, the client optionally, it doesn't have to do this, optionally it can make a get request to the MCP server and include that session ID inside the request. The response we get back is very special. It is a SSE response. SSE is short for Server Sent Events. This is a kind of response that can be held open for an arbitrary amount of time. Once this response has been established with our client, the server can then stream back little bits of information, essentially individual messages, down to our client. So once this connection has been made, now at this point in time, This is the real critical part. This now allows the server to send down messages to the client at any point in time. And this connection can be used essentially to allow the server to make a request to the client. So this is the trick. This is the workaround that the HTTP transport uses. It makes use of this long lived SSE response, and it's going to stream down messages that it wants to send it to the client. So now let's go a little bit further and walk through what happens now whenever the client might want to call a tool because the complexity is not quite done just yet. There's one other critical part for you to understand. So now at the top of this diagram, I've got that SSE response that we just created. And remember that response, that open connection, it has a session ID tied to it. So the MCP server knows exactly what that connects to, which client that connection belongs to. So then later on at some future point in time, while that response is still running, our client might decide to make a call tool request to the server. When it makes this request, it's going to include its session ID as a header. Then the MCP server is going to open up a second SSE response. So now at this point in time, we have two separate SSE responses. The first one at the top, it is intended to be used for requests that are going from the server down to the client. The second brand new SSE response is intended to be used for messages that are related to this call tool request. And importantly, this is the critical part, the second SSE response, it is closed automatically as soon as the call tool result message is sent. So the bottom response closed automatically very shortly as soon as we get a result. And the upper response is meant to be held open for an arbitrary amount of time. Then remember, our ultimate goal here was to actually call a tool. So maybe we call the add tool that we've seen many times. And the implementation of that that I've shown you a couple of times, it has some logging messages inside of it and some progress notifications, and eventually it returns a result, which would be our call tool result. Now here's another kind of little tricky part here. Technically, both the logging message and the progress notification are really tied to the call tool request. So you would kind of think that both these first messages would be sent back in the bottom response because they're both really tied to this call to a request. Well, that's not quite how many SDKs work right now. The progress notification ends up being considered to be separate from the incoming call to a request. So the progress notification actually gets sent in the first SSE response. So that's the one that is meant to be held open for a long time to allow the server to make a request down to the client. And then the logging message and the actual call to a result is sent back in the response to the post request or the get tool request that was made. So that is the entire flow. Now to make sure that all this is crystal clear, I want to walk you through a very quick demonstration where you can see all these different requests being made in a very visual way. So I put together a little demonstration. First I've got a simple MCP server put together once again with a simple add tool. The add tool is going to log some information, wait for two seconds, make a progress report, and then return a result. I've connected to this MCP server with this client that is running inside the browser. This client will allow us to make a handful of different requests to the server and then show the exact response that we get. Now in total, this client is going to go through the flow that you see on the screen. So I know there's a lot here, but in essence, we're going to first go through the entire initialization process with the stuff up here at the top. We're then going to form the git sse connection, which allows the server to send a request to the client. And then we'll do a call tool request that's going to form a second sse connection that's going to be responsible for handling messages related to this particular call to a request. So let's run this and see what happens. First, I'm going to make the initialize request. So I'll send that off. That's going to get me my session ID, which uniquely identifies my client to the server on any future request. Remember, this session ID is only provided when you are using the streamable HTTP transport. It is not used with the standard IO transport at all. You'll notice it begins with EAA. This header is automatically taken and applied to all future requests. So if I scroll down to the next request, you'll notice the MCP session ID has automatically been applied with the correct session ID that I got from the request right above. So now that we have sent the initialized request, that one right there, our second request is initialized notification. So I'm going to send that off. Okay, so now at this point in time, we've taken care of the first three boxes on here. So now we're going to make our get request to the server. This get request is going to be held open for a long time. This is what allows the server to send a request down to our client at any point in time that it wishes. It's going to be used for things like sampling or any kind of server initiated request. To make that request, I'm going to use this little box down here on the bottom right hand side. So I'm going to click on start Git SSE. This request is also going to include that session ID as well. And now I have formed that connection. So now in theory, at any point in time, the server can send down a message to my client for things like sampling, or logging, or progress notifications, and so on. And if I receive any of those, they should appear inside of this box. So now here's the last step. I'm going to call the add function. I've once again got the correct session ID inside there. Now to remind you, when I click on this, in theory, we would kind of want to see the log statement, progress statement, and the final call to a result all to appear in this box right here, which represents that part of the response. But in reality, just because the way that the MCP SDK for Python is set up, the progress notification is actually going to be sent as a part of this response up here. So we should see a log statement and a call to a result right here. And we should see the progress statement down here on the bottom right in that box. So let's run it and see what happens. Okay, so I'm going to run it. And then after a little delay, because remember, there's the two second pause inside there, we're then going to get our results. So sure enough, down here on the bottom right hand side, I've got a notifications progress. So that is the progress statement. I'm at 80 out of 100% completion. So that was sent in that response channel right there. And then the actual response to my call to request has a login statement in the form of notification slash message. And then after that, I've got the actual result of the call tool request. So this is my call to result. And the answer to adding those two numbers together is eight. And then remember, as soon as we get back that call to result, this connection is automatically closed. So I see right there, yes, the connection was closed, and I can no longer receive any messages along this SSC response we got. All right, so that's it. That's the entire flow. So now you understand what's going on behind the scenes with the streamable HTTP transport, but we're not quite done just yet. Remember, I said many times that there are scenarios where you're going to want to set these flags to be true, and it's going to break portions of this flow. So that's the last thing we need to understand. Why would we ever set those to true? Because it is something that you probably are going to want to do at some point, and exactly what does it break? So that's the last thing we really need to figure out.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.