MongoDB GridFS with Example, Modules, Indexes and Sharding

Expert-led Courses: Transform Your Career – Enroll Now

The last session was Rockmongo in MongoDB. Here, we will study GridFS  in MongoDB. In this GridFS Tutorial, we will study what is MongoDB GridFS and its modules, indexes and sharding with examples. Moreover, we will learn the two type of GridFS in MongoDB: chunks and file with their sub types.

What is MongoDB GridFS?

MongoDB GridFS is used to store and retrieve files that exceeds the BSON document size limit of 16 MB. Instead of storing it all in one document GridFS divides the file into small parts called as chunks.

The default size for a chunk is 255kb, it is applicable for all chunks except the last one, which can be as large as necessary.

MongoDB GridFS uses two collections to store files. One is used to store the file chunks and the second one to store file metadata. When we query GridFS for a file, the driver reassemble the chunks as nedded. With MongoDB GridFS, we can perform a range of queries on files stored.

GridFS is not only useful to store files that exceed 16 MB but also for storing files that you want to access without loading the entire file into memory.

GridFS places the collection in a common bucket by prefixing each with the bucket name. By default, it uses two collections with a bucket named fs:

  1. fs.files
  2. fs.chunks

It’s upto you for naming a bucket, it can be any name, and also you can create multiple buckets in a single database.

MongoDB GridFS Modules

There are two types of GridFS Modules in MongoDB, let’s discuss them with their sub-types:

  1. The chucks collection
  2. The file collection

i. The chunks collection

Document in this collection have following form:

{
   "_id" : <ObjectId>,
   "files_id" : <ObjectId>,
   "n" : <num>,
   "data" : <binary>
}

A document from chunk collection contains following fields:

  • chunks._id

Unique ObjectId of the chunk.

  • chunks.files_id

We can specify the _id of the parent document in the files collection.

  • chunks.n

Sequence number of the chunk. The numbering starts from 0.

  • chunks.data

The chunk’s payload as a BSON Binary type.

ii. The files collection

Documents in this collection have following form:

{
    "_id" : <ObjectId>,
    "length" : <num>,
    "chunkSize" : <num>,
    "uploadDate" : <timestamp>,
    "md5" : <hash>,
    "filename" : <string>,
    "contentType" : <string>,
    "aliases" : <string array>,
    "metadata" : <any>,
}

A document from files collection contains following fields:

  • files._id

It is a unique identifier for this document. The _id is of the same data type that you choose for original document.

  • files.length

Size of document in bytes.

  • files.chunkSize

The size of each chunk in bytes. The default size is 255 kB.

  • files.uploadDate

It gives the date the document was first stored by GridFS.

  • files.md5

An MD5 hash of the complete file returned by the filemd5 command. It is of string type.

  • files.filename

It is optional. A human-readable name for the GridFS file.

  • files.contentType

It is optional. A valid MIME type for the GridFS file.

  • files.aliases

It is optional. An array of alias strings.

  • files.metadata

It is optional. Here, we store the additional information and metadata filed can be of any data.

MongoDB GridFS Indexes

MOngoDB GridFS uses indexes on each of the chunks and files collections for efficiency. You can create any additional indexes as desired to suit your application’s needs.

  1. The chunks Index
  2. The file index

i. The chunks index

It uses a unique, compound index on the chunks collection using the files_id and n fields. It allows efficient retrieval of chunks, as shown in the below example:

db.fs.chunks.find( { files_id: myFileID } ).sort( { n: 1 } )

If by any chance this index does not exist, you can issue this operation to create it using mongo shell:

db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } );

ii. The files index

It uses an index on the files collection using filename and UploadDate fields. It allows for efficient retrieval of files, as shown in the example below:

db.fs.files.find( { filename: myFileName } ).sort( { uploadDate: 1 } )

If by any chance this index does not exist, you can issue this operation to create it using mongo shell:

db.fs.files.createIndex( { filename: 1, uploadDate: 1 } );

MongoDB GridFS Sharding

As we had discussed earlier that GridFS consists of two collections named as files and chunks.

i. The chunks collection

To shard the chunks collections, use {files_id : 1, n : 1} or {files_id : 1} as the shard key index.

ii. The files collection

The files collection is small and only contains metadata. No keys of GridFS lend themselves to an even distribution in a sharded environment. Doing this allows all the file metadata documents to live on a primary shard.

How to Read and Write files in MongoDB GridFS?

For writing to a file in MongoDB using the following code in VScode.

//1. Load the mongoose driver
var mongooseDrv = require("mongoose");
//2. Connect to MongoDB and its database
mongooseDrv.connect('mongodb://localhost/filesDB', { useMongoClient: true });
//3. The Connection Object
var connection = mongooseDrv.connection;
if (connection !== "undefined") {
    console.log(connection.readyState.toString());
    //4. The Path object
    var path = require("path");
    //5. The grid-stream
    var grid = require("gridfs-stream");
    //6. The File-System module
    var fs = require("fs");
    //7.Read the video/image file from the videoread folder
    var filesrc = path.join(__dirname, "./filestoread/example.png");
    //8. Establish connection between Mongo and GridFS
    Grid.mongo = mongooseDrv.mongo;
    //9.Open the connection and write file
    connection.once("open", () => {
        console.log("Connection Open");
        var gridfs = grid(example.db);
        if (gridfs) {
            //9a. create a stream, this will be
            //used to store file in database
            var streamwrite = gridfs.createWriteStream({
                //the file will be stored with the name
                filename: "example.png"
            });
            //9b. create a readstream to read the file
            //from the filestored folder
            //and pipe into the database
            fs.createReadStream(filesrc).pipe(streamwrite);
            //9c. Complete the write operation
            streamwrite.on("close", function (file) {
                console.log("Write written successfully in database");
            });
        } else {
            console.log("Sorry No Grid FS Object");
        }
    });
} else {
    console.log('Sorry not connected');
}
console.log("done");

Here we are loading the mongoose driver. This is used to connect with the MongoDB database.

For reading, a file in MongoDB use the following code in VScode.

var mongooseDrv = require("mongoose");
var schema = mongooseDrv.Schema;
mongooseDrv.connect('mongodb://localhost/filesDB', { useMongoClient: true });
var connection = mongooseDrv.connection;
if (connection !== "undefined") {
    console.log(connection.readyState.toString());
    var path = require("path");
    var grid = require("gridfs-stream");
    var fs = require("fs");
    var videosrc = path.join(__dirname, "./filestowrite/celibration.mp4");
    Grid.mongo = mongooseDrv.mongo;
    connection.once("open", () => {
        console.log("Connection Open");
        var gridfs = grid(example.db);
        if (gridfs) {
            var fsstreamwrite = fs.createWriteStream(
                path.join(__dirname, "./filestowrite/example.png")
            );
            var readstream = gridfs.createReadStream({
                filename: "example.png"
            });
            readstream.pipe(fsstreamwrite);
            readstream.on("close", function (file) {
                console.log("File Read successfully from database");
            });
        } else {
            console.log("Sorry No Grid FS Object");
        }
    });
} else {
    console.log('Sorry not connected');
}
console.log("done");

MongoDB GridFS Limitations

There are some limitations of MongoDB GridFS and they are as follows:

  1. Working Set
  2. Performance
  3. Atomic Upcate

i. Working Set

The database content for serving files can significantly churn your memory working set. If you do not want to disturb the working set, you must use another MongoDB server to serve your files.

ii. Performance

The performance of file serving will be slow when you are using a web server and filesystem. But the management benefits might be the reason for the slowdown.

iii. Atomic Update

GridFS does not provide an atomic update of a file. If this is necessary, then you will have to maintain multiple versions of your files and then pick the right version of the same.

Summary

Hence, we have studied all about GridFS in MongoDB  with its modules, sharding, indexes with examples. In addition, we studied sub-types of indexes, sharding, and modules: file and chunks with their examples.

Hope, you are enjoying!

Still, have a query? Feel free to ask in the comment section.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

courses

DataFlair Team

The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.

Leave a Reply

Your email address will not be published. Required fields are marked *