Get startedGet started for free

Computations with user-defined functions

1. Computations with user-defined functions

Naturally, it's not very scalable to perform data transformations manually every single time. And as I mentioned earlier, there's a good chance that in practice, you'll work with a very large number of tables, so it's a lot more practical to find ways to capture logic in reusable chunks of code. This approach makes it easy to manage chunks of logic in a centralized fashion, meaning you can ensure consistency in the pipelines that you build. One way of doing this is with user-defined functions, also known as UDFs. With a user-defined function, you can write your own custom logic and reuse it over and over in your queries and data transformations. Typically, UDFs are used to capture logic that perform a specific computation. Say I need to perform a calculation that returns a value given some inputs, or if I need to format a value like a telephone number in a certain way, for example. A user-defined function would be the perfect snowflake primitive to capture that logic, so that I can reliably reuse it throughout my data transformations. And if I ever need to update the logic, I can easily update it in one place versus several different places. Here are the main things to know about user-defined functions. You can write UDFs in one of many supported languages, including SQL, JavaScript, Python, Java, and Scala. UDFs are typically intended to return a single value. This makes them really useful for performing very specific tasks in your transformation logic, like formatting a raw date-time value, doing some math using some inputs, and much more. For more complex logic, you would consider a stored procedure, which I'll cover in the next video. The common pattern for writing a user-defined function is to first define the function and its associated logic, and then, once created, invoke the UDF during transformations or other data processing. Let's get hands-on and put these concepts to use. It's a good time now to pause the video to make sure you're logged into your Snowflake account. Okay, so we know Hamburg, Germany can be prone to some pretty intense wind forces, and we've created a view to track that. But let's expand the view that we've created to also track temperature and precipitation data. Currently, temperatures are recorded in Fahrenheit, and precipitation is tracked in inches. Let's create some UDFs that will help us add columns to our view for tracking temperature in Celsius and precipitation in millimeters. We are tracking weather for a country in Europe, after all. Open the UDF temp-length SQL file in the repo, copy its contents, and paste them into a new SQL worksheet. Don't run the file just yet. Let's take a look at what we have. We have two UDFs. The first converts Fahrenheit values to Celsius. I create the UDF by specifying the function keyword, the schema where it will live, and the name of the function. I also specify that the function will accept an argument of type number, and also return a value of type number. If we were going to write the logic in another language, like Python for example, this is where you would make the corresponding changes, along with setting some other parameters. The Snowflake documentation has a lot of great detail on this in case you're interested. We'll write our logic in SQL. Okay, the magic happens right here after the as keyword. The as keyword points to the definition of the UDF, and the two sets of dollar signs act as opening and closing delimiters. And the entirety of the function's logic should be encapsulated between these two sets of delimiters. The UDF below does something similar, but converts inches to millimeters, of course. Let's run all of the SQL. You can refresh on the left to see that they were indeed created. Okay, so we haven't invoked the UDFs. We've simply defined them. To use them, we have to invoke them in our SQL. To use them, we have to invoke them in our SQL. So let's go ahead and use these UDFs to expand our existing view that tracks Hamburg. Navigate to the Hamburg sales expanded SQL file and copy its contents. Paste them into a new SQL worksheet. Don't run the file just yet. The first main block of SQL returns the wind speeds for the city of Hamburg, but it also returns some new columns. We now return temperature in Fahrenheit and precipitation in inches. We apply our UDFs to return temperatures in Celsius and precipitation in millimeters. Highlight and run the SQL that sets your context and then execute this first large block of SQL. Okay, let's browse the results. And there you have it. This view contains a lot more data about the weather in Hamburg on the days where no sales were recorded. Let's create a new view from this by adding the following at the top of the first SQL block. Create or replace view harmonized weather Hamburg as. Run only this block of SQL again and note on the left that we have a new view. Okay, great. But let's go one step further. We now have the logic necessary to track this weather data for all cities in our Tasty Bytes dataset. In fact, let's use this logic to create a new view that will allow analysts on the team to quickly answer weather related questions about any city. For example, they could use a view to answer questions like what cities in the United States have had sales on days with temperatures below freezing? Having this view would allow them to quickly discover insights without needing to ping the data teams for new pipelines or views. Run the last block of SQL. And note that the view was created on the left in the analytic schema. In just a few lines of code, you created two UDFs in SQL and use them to derive new columns based on existing columns. By capturing logic in a UDF, you can reliably reuse it throughout the transformations you use to build your pipeline and easily manage it from a centralized place. The UDF definition. And of course, you could imagine having several different UDFs all likely tailored to do something different and then invoking them as needed throughout your data transformations. Let's recap what you learned. You learned what a user defined function is and what they're typically used for. You learned how to define a UDF in SQL and how to invoke them. In the next video, we'll talk about how to process changes to data in an incremental fashion using streams.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.