1. Working with the Firehose delivery stream
Well hello there! Welcome to Lesson 3!
In the last lesson, we got everything ready to create a Firehose stream. In this lesson, we'll make one - and write to it!
2. Ready to create stream
After last lesson, everything is in place.
3. Ready to create stream
S3 bucket's been created.
4. Ready to create stream
dcUser has proper permissions for Firehose.
5. Ready to create stream
And we created a role for the Firehose stream to assume.
6. Ready to create stream
All that's left is to create our stream!
7. Get Role ARN
We will need to specify a role ARN - Amazon's unique identifier - when creating our stream.
Go to IAM, Roles, FirehoseDeliveryRole. The ARN is on top.
8. Initialize boto3 client
Initialize the boto3 firehose client.
9. Create the stream!
Create the stream using client's create_delivery_stream() method, passing gps-delivery-stream as DeliveryStreamName, and DirectPut for DeliveryStreamType.
DirectPut means we will be writing to the stream directly (as opposed to writing from another stream).
Next, pass the S3DestinationConfiguration object containing the RoleARN you copied, followed by S3 bucket ARN.
S3 ARNs are always in the format of arn:aws:s3:::BUCKET-NAME.
10. Create stream response
The response object comes back with a DeliveryStreamARN key.
11. Stream is ready
We have a stream that ingests data and writes it to an S3 bucket. Our components are ready!
12. Writing to stream
Let's write to the stream!
13. Telematics hardware
We will install OBD2 reader devices in each vehicle. They will collect vehicle data.
14. Telematics data send
The code that we write for these devices will take the sensor data and write it to the stream.
15. Single record
Let's examine a single record from a vehicle sensor.
There's a record id.
A capture timestamp.
VIN number - or vehicle ID.
Vehicle location latitude and longitude.
And speed.
16. Records coming in
Each of these are coming in from different vehicles at different times - a great use case for streams.
17. Another use case
You would use the same pattern for collecting clickstream data from multiple browsers.
18. Another use case
Or collecting live logs from servers.
19. Patterns
Data engineering is about understanding the pattern and finding the right time to apply it.
20. Sending a record
To send a record, use the firehose client's put_record() method, passing the DeliveryStreamName as an argument. We send a dictionary to the Record argument, containing a Data key. This contains the record.
21. Sending a record
What we send to the stream in the Data key needs to be a string.
22. Sending a record
We convert our record to a string where each field is separated by a space, making it easier to read into Pandas later.
23. Sending a record
We use the python string join method to bring the values of a dictionary together into a string, enforcing the string data type across each value.
24. Putting it together
Putting it together, we have our record, and convert it to a string
25. Putting it together
Then, we push it to the stream. We also attach a line break at the end so that each record is a row.
26. Created files
Firehose doesn't immediately write because of buffering. After about 5 minutes, we see the data in our S3 bucket.
Notice how Firehose arranges the data into folders based on a year/month/date pattern.
27. Sample data
On closer examination of the file, it looks like a CSV, sans commas. We can read it with Pandas!
28. Created files
This part should be familiar from the previous course. Click on one of the objects and copy its key.
29. Create S3 client
Create the boto3 s3 client.
30. Read data into DataFrame
Use the get_object() method to read the object data.
Then use read_csv() to read it into pandas, assigning column names.
31. vehicle_data
Now we can use Pandas to analyze the data that streamed in!
32. Review
In this lesson you learned to create a Firehose delivery stream.
33. Review
Assigning it a role to assume and a bucket to write to.
34. Review
You learned how to write code that will put data into the stream.
35. Review
And how this pattern could apply to other use cases.
36. Review
Finally, you reviewed how to load streamed data from s3 into a dataframe for further analysis.
37. Let's practice!
Lots of skills to practice. Let's put them all together!