TextIO & FileIO
1. TextIO & FileIO
TextIO and FileIO are used for sources and sinks when you need to work with text or files, respectively. Let's dive into an example in Java for TextIO. This is a simple read using TextIO. There are variant different read methods available for TextIO. Now let's take a look at a couple of Python examples. In the first example, you provide a file name as the first step and pass that into the second step, which uses the file name to read from a file. This is an example where the second step is also a source and not necessarily the first step. For the second example, you're reading from a file by passing in the file name. Use the method that best suits your needs. Here's an example of FileIO in Java using a file match pattern. The match keyword and the argument filepattern will allow you to search for a pattern as seen here. You're able to use the filename and other metadata as part of the read. In the Python example shown here, you're mapping the file to the variable x and using the object to access the contents and the metadata. File pattern matching is useful when you need to grab a range of files. Beam FileIO is also able to continuously monitor a location for a particular pattern. This Java example will monitor the location every 30 seconds for an hour to see if new files match the pattern provided. This pattern is useful for times when you have files flowing in but have a sliding window in which they will arrive. Here's another example, but rather than providing the filenames, you read off a message queue such as Pub/Sub using PubsubIO. Certain systems like Cloud Storage have the native ability to trigger a message to Pub/Sub on metadata changes. The message is then parsed to return the filename for the subsequent step that can now target a file for a read. This method lets you read a stream of files. Contextual I/O provides many mechanisms to enhance the behavior of text reading. Historically, when more complicated TextIO reads are required, you are relegated to using FileIO. Contextual I/O lets you use TextIO for those use cases. For example, you're able to return things such as ordinal position or read multi-line CSV records. This is an example of a sink written in Java using TextIO to write your output to a file. You can write to many different file systems and object stores. Here is another example of a simple write using TextIO, this time in Python. Dynamic destinations allow you to determine the sink destination at run time. You can invoke this with the writeDynamic function in Java. This example allows you to use the transaction type to determine the file name. In this Python example, you take the dynamic destinations a step further by writing to different sinks depending on the characteristics of the data. In this example, the record type determines the destination file type. Dynamic destinations are useful when you are unsure of the specific destination at run time. In these examples, you can expand the range of your destinations without altering the code.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.