AWS S3 Select Demo
This project showcases the rich AWS S3 Select
feature to stream a large data file in a paginated style
.
Currently, S3 Select
does not support OFFSET
and hence we cannot paginate the results of the query. Hence, we use scanrange
feature to stream the contents of the S3 file.
Background
Importing (reading) a large file leads Out of Memory
error. It can also lead to a system crash event. There are libraries viz. Pandas, Dask, etc. which are very good at processing large files but again the file is to be present locally i.e. we will have to import it from S3 to our local machine. But what if we do not want to fetch and store the whole S3 file locally at once?
Well, we can make use of AWS S3 Select
to stream a large file via it's ScanRange
parameter. This approach…