Get startedGet started for free

Working with the Firehose delivery stream

1. Working with the Firehose delivery stream

Well hello there! Welcome to Lesson 3! In the last lesson, we got everything ready to create a Firehose stream. In this lesson, we'll make one - and write to it!

2. Ready to create stream

After last lesson, everything is in place.

3. Ready to create stream

S3 bucket's been created.

4. Ready to create stream

dcUser has proper permissions for Firehose.

5. Ready to create stream

And we created a role for the Firehose stream to assume.

6. Ready to create stream

All that's left is to create our stream!

7. Get Role ARN

We will need to specify a role ARN - Amazon's unique identifier - when creating our stream. Go to IAM, Roles, FirehoseDeliveryRole. The ARN is on top.

8. Initialize boto3 client

Initialize the boto3 firehose client.

9. Create the stream!

Create the stream using client's create_delivery_stream() method, passing gps-delivery-stream as DeliveryStreamName, and DirectPut for DeliveryStreamType. DirectPut means we will be writing to the stream directly (as opposed to writing from another stream). Next, pass the S3DestinationConfiguration object containing the RoleARN you copied, followed by S3 bucket ARN. S3 ARNs are always in the format of arn:aws:s3:::BUCKET-NAME.

10. Create stream response

The response object comes back with a DeliveryStreamARN key.

11. Stream is ready

We have a stream that ingests data and writes it to an S3 bucket. Our components are ready!

12. Writing to stream

Let's write to the stream!

13. Telematics hardware

We will install OBD2 reader devices in each vehicle. They will collect vehicle data.

14. Telematics data send

The code that we write for these devices will take the sensor data and write it to the stream.

15. Single record

Let's examine a single record from a vehicle sensor. There's a record id. A capture timestamp. VIN number - or vehicle ID. Vehicle location latitude and longitude. And speed.

16. Records coming in

Each of these are coming in from different vehicles at different times - a great use case for streams.

17. Another use case

You would use the same pattern for collecting clickstream data from multiple browsers.

18. Another use case

Or collecting live logs from servers.

19. Patterns

Data engineering is about understanding the pattern and finding the right time to apply it.

20. Sending a record

To send a record, use the firehose client's put_record() method, passing the DeliveryStreamName as an argument. We send a dictionary to the Record argument, containing a Data key. This contains the record.

21. Sending a record

What we send to the stream in the Data key needs to be a string.

22. Sending a record

We convert our record to a string where each field is separated by a space, making it easier to read into Pandas later.

23. Sending a record

We use the python string join method to bring the values of a dictionary together into a string, enforcing the string data type across each value.

24. Putting it together

Putting it together, we have our record, and convert it to a string

25. Putting it together

Then, we push it to the stream. We also attach a line break at the end so that each record is a row.

26. Created files

Firehose doesn't immediately write because of buffering. After about 5 minutes, we see the data in our S3 bucket. Notice how Firehose arranges the data into folders based on a year/month/date pattern.

27. Sample data

On closer examination of the file, it looks like a CSV, sans commas. We can read it with Pandas!

28. Created files

This part should be familiar from the previous course. Click on one of the objects and copy its key.

29. Create S3 client

Create the boto3 s3 client.

30. Read data into DataFrame

Use the get_object() method to read the object data. Then use read_csv() to read it into pandas, assigning column names.

31. vehicle_data

Now we can use Pandas to analyze the data that streamed in!

32. Review

In this lesson you learned to create a Firehose delivery stream.

33. Review

Assigning it a role to assume and a bucket to write to.

34. Review

You learned how to write code that will put data into the stream.

35. Review

And how this pattern could apply to other use cases.

36. Review

Finally, you reviewed how to load streamed data from s3 into a dataframe for further analysis.

37. Let's practice!

Lots of skills to practice. Let's put them all together!