Traces in Snowflake

1. Traces in Snowflake

Capturing traces is a little different from capturing logs in Snowflake. This is because traces can contain much more information about how events in a system occurred. This is in contrast to a log, which contains information about what occurred, and is based on a description that you write. The practical implementation is different, too. You used a common Python logging library in the last exercise, but to capture traces, you'll need to use Snowflake-specific libraries that are designed for the handler code you're using. These libraries are available in Java, JavaScript, Python, Scala, and Snowflake scripting. We're going to use them shortly in a Python environment. Before we get hands-on, let's learn a little bit more about traces. They can carry much more information than a log, so it's important to understand how to effectively capture and use traces. Recall that a trace is a record of the entire journey of a transaction or request as it moves through a system. You can think of it as following a package from sender to recipient, tracking every stop and handoff along the way. This tracking can include the timing of operations, as well as dependencies between the operations. They're essentially a record of a sequence of many events that all belong to a larger single event. For example, while a log might have summarized information about, say, a successful connection from system A to system B being established, a trace might have all of the detailed steps that were necessary to make that successful connection. Let's break things down even more. Traces are made up of three core components. Spans, trace events, and attributes. You can think of a span as the basic unit of work in a trace. It represents a single operation in a trace and has a parent-child relationship with other spans. Nested within spans are trace events. Yes, I know, we're really going down the rabbit hole now. Trace events are discrete events that occur within a span. They provide even more context about what happened during a span's lifetime. The hierarchy looks something like this. A trace is made up of spans. Inside of spans, there are trace events. But wait, there's more. Because there can be so much information captured in this hierarchy, you're also able to use what's known as an attribute. Attributes allow you to tag a span or a log even. These attributes can help you down the road when you're filtering or analyzing telemetry data. With this context, you can see why traces are so important in observability. Let's get hands-on and start capturing traces using Python. Now is a good time to pause the video if you need to log into your Snowflake account. Let's open up our SQL worksheet from earlier where we added logging. If you don't have it handy, you can find the code in the Solution folder of the Module 2 folder. The file is called SolutionsSprocLogs.sql. Copy the contents of the file and paste them into a new SQL worksheet. Run the first few lines of code to set your context. Start by enabling tracing for our account. At the top, type alter session set, trace level equals always. Run the command. We're now capturing traces. Next, add the snowflake-telemetry-python package at the top of the stored procedure. This is a library we'll need to use to capture traces in our event table. On this line, you'll notice that we set a variable called traceID. It's set to a UUID value. A UUID is a universally unique identifier. Every time this stored procedure runs, a UUID will be generated, and it'll be used throughout the traces generated in the code. This helps track our trace telemetry to a specific run of the stored procedure and makes it easy to analyze our telemetry when needed. Snowflake automatically creates a root span for us when this procedure is run, so we don't need to explicitly open or close spans. All we need to do is set span attributes. The attributes allow us to pass in a key value pair that tags a span with information like, at this point, we're in the query stream step of the process. So let's add a couple of attributes. On this line, type telemetry.setSpanAttribute, double quotes procedure, processOrderHeadersStream. This attribute essentially says, we're starting the procedure called processOrderHeadersStream. In the try block, we're now in the query stream step of the process. Let's add an attribute by typing telemetry.setSpanAttribute, open parentheses, processStep, comma, queryStream. On the next line, let's add telemetry.addEvent, open parentheses, double quotes, queryBegin, comma, open curly brace, description, starting to queryOrderHeaderStream. addEvent allows us to mark a specific moment in time within our span. This line says that the queryBegin event occurs at this precise moment in time. The second argument, the object, allows us to add more context to this event. The context that we're adding is a description of what is happening. Okay, let's close this out. On this line, add telemetry.addEvent, open parentheses, query, complete, open curly brace, description, completed query of OrderHeaderStream. This time, we mark that the query has completed along with some context in the description. Okay, let's rerun our entire worksheet and see how traces are captured in the event table. You can see from this last line, we're filtering on record types that have the word span in them. And yes, there they are. You can see span and span event records. The span record type looks like a record of a run, whereas a span event record looks like the individual events happening in a run. If you look to the left under the trace column, you can see the trace ID. This is set to a generated UUID that the spans and span events belong to. Great job. If something in your pipeline goes wrong, well-implemented traces will provide you with the entire journey of footsteps necessary to track down the culprit. Join me in the next video as we step out of logs and traces and start learning about how to use alerts in Snowflake.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.