User-Defined Functions (UDFs) - Part II

1. User-Defined Functions (UDFs) - Part II

Welcome back! In the last video, we created a SQL UDF that didn’t take in any arguments. In this video, we’ll get a little more sophisticated and create a UDF that takes in an argument, and we’ll also create a simple Python UDF. It’s going to be great. Let’s get to it. So you might recall from the last video that we made a UDF called max_menu_price that always gives a result in USD. But what if we want the answer in another currency? Let’s create a max_menu_price_converted function that has the same basic bones as max_menu_price, but it lets you hand in an exchange rate – we’ll call this USD_to_new – and then it spits out the max menu price in whatever new currency you’ve specified. The function could look like this: CREATE FUNCTION max_menu_price_converted(USD_to_new NUMBER) RETURNS NUMBER(5,2) AS $$ SELECT USD_TO_NEW*MAX(SALE_PRICE_USD) FROM FROSTBYTE_TASTY_BYTES.RAW_POS.MENU $$ ; Some things to notice – Instead of just having an open parenthesis and a closed parenthesis with nothing in them, we want this to take in an input, so to do that we put in a new variable name (USD_to_new) and we specify the data type (NUMBER). Then inside the SQL query, we multiply the max by this new conversion factor. Okay, so if we want the max menu price in Canadian dollars – and let’s just say the conversion rate between USD and Canadian dollars is 1 USD to 1.35 Canadian dollars – then we can simply run the following: SELECT max_menu_price_converted(1.35); And this tells us the max menu price, converted to Canadian dollars, is 28.35. Pretty cool. Okay, now I want to cover one last thing – If UDFs were just something you could write in SQL, that alone would make them very useful. But they’re even better than that – In addition to SQL, you can write them in Python, Javascript, Java, and Scala. So let’s take a moment and write a Python UDF that we can then call from inside a SQL query. Here’s how it works – the experience of creating and using a Python UDF is a lot like the experience of creating and using a SQL UDF. The differences are, first, that you specify the language – in this case, “python”. Second, you specify the run_time version, so the version of Python you want to use. Third, you specify a handler. And fourth, you write the inside of your function in Python. Other than that, Python UDFs look like SQL UDFs. Same basic beast. CREATE FUNCTION winsorize (val NUMERIC, up_bound NUMERIC, low_bound NUMERIC) returns NUMERIC language python runtime_version \= '3.9' handler \= 'winsorize_py' AS $$ def winsorize_py(val, up_bound, low_bound): if val > up_bound: return up_bound elif val \< low_bound: return low_bound else: return val $$; Here, this function (which I’m calling “winsorize”) basically puts a ceiling and a floor on the values your data can take. (In statistics, you winsorize when you have outliers that you don’t want to drop, but whose influence you do want to reduce – for example, maybe you believe some of the values *are* truly large, just not as large as what’s recorded.) So we’re taking in three arguments this time – the value we want to winsorize, and the upper bound and lower bound we want that value to fall between. (Usually you specify these bounds in percentiles, but here we’ll make it simple and just hand over the values.) The logic inside is straightforward – it’s just an if, elif, else statement. If the value is too high, set it to the upper bound. If it’s too low, set it to the lower bound. Otherwise, keep it the same. There’s nothing fancy here – no error handling – and in fact this would be easy to do with SQL. But let’s ignore that and see if this worked. Cool, if we run: SELECT winsorize(12.0, 11.0, 4.0); We see that the output is 11! So we handed in a value of 12, and it saw that this was higher than the upper bound of 11, so it set it to 11, which is what we wanted. There’s a lot you can do with UDFs that we didn’t get into here – for one thing, you can specify the Python packages you want to use inside the UDF, like NumPy or pandas. You can also create vectorized UDFs which take in batches of rows as inputs and give out batches of rows as outputs. Lots of exciting things to learn about. But even if there’s a lot more we could learn, I don’t want to discount the power of what we *did* cover. To recap, we learned about four things: First, how to see a list of all of Snowflake’s functions with the SHOW FUNCTIONS command. Second, how to create a SQL UDF using CREATE FUNCTION. Third, how to create a Python UDF using CREATE FUNCTION (specify language, runtime, handler). And fourth, calling UDFs using SELECT. Now it’s time to take our user-defined function skills to the next level by learning about user-defined table functions.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.