Video: Sampling

1. Video: Sampling

Sampling allows a server to access a language model like Claude through a connected MCP client. To help you understand what that means and what sampling is all about, I want to show you a quick demo application. So here we go. This is a chat application that is powered by Claude, so I can ask it, what's 1 plus 1, and I'll get a response back. This chat application is also connected to a single MCP server that implements one tool called the Research Tool. I can use this tool to ask Claude to maybe write a report on archaeology. And when I do so, Claude is going to decide to use that research tool. This time I'm going to go and do some research, synthesize all the results, and then present a report back to me. Now that you have seen this research tool run, I'd like to show you what is going on behind the scenes in this application and how the research tool actually works. Alright, so here's a high-level overview of what's going on. First, at the very top left, I've got a Next.js application, that's what's actually rendering some content inside the browser, and that's got an integrated MCP client inside of it. A user at some point in time is going to ask us to research a topic like archaeology. We're going to send that query off to Claude, and Claude is going to decide to run the research tool that has been provided ahead of time. Our MCP client on the next JS application is going to translate that into a tool call request and send it over to the MCP server. Then on the MCP server, the research tool is going to run. It's going to make a number of requests off to Wikipedia and try to find some articles around archaeology and some related content and so on. Once the research tool has fetched some number of pages from Wikipedia, we need to take all that content and synthesize it into a final report that can be displayed to the user. And there's more than one way we could do that. Let me show you two different options we have. In option number one, we could have the MCP server reach out directly to Claude or some similar LLM, and ask it to summarize all the results that found from the Wikipedia searches. So in this case, we would be giving our MCP server direct access to a language model. This would absolutely work, it is definitely something we could do. However, this adds a lot of complexity to our MCP server. We would have to write in some amount of code to allow the MCP server to make a request off to Claude, get the response back, extract the generated text, and so on. We would also have to make sure that the MCP server has an API key to access Claude or whatever other language model we were using. So again, option number one would work, but it adds a decent amount of complexity. So on to option two. In option two, we're going to have our MCP server make use of a technique referred to as sampling. With sampling, we're going to have our MCP server ask the client to run a prompt on its behalf. So in short, the MCP server is going to write up a prompt, send off to the MCP client and say, Hey, could you feed this into Claude for me? The MCP client that we've put together is then going to take that prompt and send it off to Claude. It will get a response back and then send the result of that text generation back to the MCP server where it can then do whatever it wants to do with that generated text. This approach moves the complexity over to the MCP client, and in this case it's kind of okay because as you saw our MCP client or that Next.js application already has an existing connection to Claude. It is already making requests off to Claude, so it's not that much more to just run one additional request on behalf of the MCP server. The other obvious benefit here is that the MCP server does not need an API key to access Claude at all, and if this is a publicly accessible MCP server, now it doesn't need to worry about paying for tokens generated on behalf of someone else making use of the server. So this technique is referred to as sampling. It allows a MCP server to ask the client to run Claude or any other language model to generate some amount of text. You can really think of sampling as moving the responsibility of calling Claude from a server off to the client. Sampling is most useful anytime you are putting together a publicly accessible MCP server. You don't want to have a MCP server that is publicly available for anyone to use that is just going to allow anyone to come and generate some text. For your publicly accessible server, you would want to make use of sampling and move the responsibility of generating text from your server off to whatever client is connecting to it. Making use of sampling requires a little bit of setup on both the server and the client. First off, on the server, inside of a tool function you will call the createMessage function, and you're going to pass into that a list of messages that you want to be eventually handed off to Claude. This list of messages is going to be formulated into a request that gets sent down to the client. Then on the client you have to write out a little bit more code, specifically something called a a sampling callback. I've got an example sampling callback at the top of this code snippet on the right-hand side. Your sampling callback is going to receive the messages that were sent from the server. Then inside this callback, it is up to you to call Claude or whatever other language model you are making use of, generate some text, and then return a create message result. All right, so that is sampling. As you can see, it's not terribly complicated. It's really just this concept or this idea of shifting the burden of generating some text from the server back over to the client. And again, this is a technique you're definitely going to want to look at anytime you're putting together a publicly accessible MCP server.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.