Connecting to your data

1. Connecting to your data

Welcome back! In the exercises so far, you've been uploading files directly into the chat. Now we're going to look at how data connections actually work — from flat files through to full integrations — and how to give AI the context it needs to analyze your data properly.

2. The flat file approach... and its limits

When you upload a file into an AI assistant, the tool loads the whole thing into its context window — its working memory. Every platform has size limits. Free tiers have the lowest. For a few thousand rows this works fine, and it's great for ad-hoc exploration. But it's not scalable, it's not reproducible, and for real organizational data, you'll hit those limits fast. It's a starting point, not a workflow.

3. Integrations: connecting to your real data

For anything beyond ad-hoc work, you need integrations. MCP — Model Context Protocol — is one emerging standard for connecting AI assistants directly to data platforms, and it's being adopted broadly. Snowflake's MCP server, for example, lets a compatible AI assistant see your Snowflake tables, explore schemas, and run queries against them, with results returned right into the chat.

4. Meeting your data for the first time

Once you're connected, you need to understand what you actually have. Traditionally, that was laborious. I used to spend hours dragging and dropping in Tableau, but AI changes this completely. Ask for an overview and you'll get field types, sample values, early patterns, and potential quality issues — a starting point that used to take hours and now takes seconds.

5. The context problem

Here's the catch. AI assistants have no inherent context about what your data actually means in your business. They make educated guesses based on field names and sample rows — and honestly, they're often surprisingly good at it. But if your field is called something like Rev_Q4_Adj, the AI doesn't know whether that's revenue adjusted for returns, for currency, or something entirely specific to your organization. Guessing isn't good enough for serious analysis.

6. Data dictionaries and semantic layers

The fix is to give AI that context explicitly, right at the start of the conversation. A data dictionary describes the structure of the data: data types, allowed values, and relationships. That’s a good start.

7. Data dictionaries and semantic layers

A semantic layer goes further: it describes how your data is actually used in your business — how key metrics are defined, what "customer" means in your organization. Together, they close the gap between what the AI can infer and what it actually needs to know.

8. Giving AI context: the ad-hoc approach

For ad-hoc analysis in a chatbot, you include your data dictionary in your opening message — before asking any questions. A markdown table or YAML block works well for this. Once you've shared that context, the AI carries it throughout the conversation. You can ask AI to help you draft the dictionary in the first place. But since you know your business far better than it does — always review and refine what it produces.

9. Giving AI context: the production approach

For production work, you need something more robust. Tools like Snowflake Cortex Analyst let you define your business metrics and logic once in a structured format — and the AI queries against those definitions every time. The context isn't typed into a prompt; it's built into the integration. This is the same principle we saw with MCP: the more you connect AI directly to your data infrastructure, the more reliable and scalable your analysis becomes.

10. Let's practice!

Time to connect to some data and start exploring — with the right context in place.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.