Streaming Direct PUT Firehose Records Into S3 With Newline Characters
28 Oct 2023Recently, I found myself working with AWS Kinesis Firehose and S3. All I wanted was to write custom JSON records programmatically into a Firehose stream, and have it output to an S3 location. However, by default, the records would get written without any newline separators. Searching for how to insert newline characters generally got me to complex solutions for complex data input sources.
Neither AWS documentation, nor existing Stack Overflow answers pointed me towards the incredibly simple actual syntax. I chased down complex solutions using Firehose features until I figured out you can just include the newline character directly after your message.
Note that my solution is for:
- Python
- Boto3
- Direct PUT writes, aka programmatically writing custom records into the stream
It does not apply if your Firehose data source is another AWS service, such as an SQS queue or a DynamoDB stream.
In case you, like me, are looking into any of these just to insert newline characters…
… stop. They’re not necessary if you’re writing custom data straight into the stream using an AWS API or SDK.
This assumes that you have the infrastructure set up already. All you need is:
- An S3 bucket
- A simple Kinesis stream with
deliveryStreamType
set to"DirectPut"
and anextendedS3DestinationConfiguration
containing minimal configuration
Now for the code.
For some reason, I kept following suggestions to encode the data in different ways (base64 etc), until I figured out that the solution was actually very simple. It took me a bit of trial and error to structure it correctly.
It’s so straightforward, it’s silly. If you happen to be stuck like I was, these snippets are tried and tested. I hope they help!