MongoDB GridFS with Example, Modules, Indexes and Sharding
Expert-led Courses: Transform Your Career – Enroll Now
The last session was Rockmongo in MongoDB. Here, we will study GridFS in MongoDB. In this GridFS Tutorial, we will study what is MongoDB GridFS and its modules, indexes and sharding with examples. Moreover, we will learn the two type of GridFS in MongoDB: chunks and file with their sub types.
What is MongoDB GridFS?
MongoDB GridFS is used to store and retrieve files that exceeds the BSON document size limit of 16 MB. Instead of storing it all in one document GridFS divides the file into small parts called as chunks.
The default size for a chunk is 255kb, it is applicable for all chunks except the last one, which can be as large as necessary.
MongoDB GridFS uses two collections to store files. One is used to store the file chunks and the second one to store file metadata. When we query GridFS for a file, the driver reassemble the chunks as nedded. With MongoDB GridFS, we can perform a range of queries on files stored.
GridFS is not only useful to store files that exceed 16 MB but also for storing files that you want to access without loading the entire file into memory.
GridFS places the collection in a common bucket by prefixing each with the bucket name. By default, it uses two collections with a bucket named fs:
- fs.files
- fs.chunks
It’s upto you for naming a bucket, it can be any name, and also you can create multiple buckets in a single database.
MongoDB GridFS Modules
There are two types of GridFS Modules in MongoDB, let’s discuss them with their sub-types:
- The chucks collection
- The file collection
i. The chunks collection
Document in this collection have following form:
{ "_id" : <ObjectId>, "files_id" : <ObjectId>, "n" : <num>, "data" : <binary> }
A document from chunk collection contains following fields:
- chunks._id
Unique ObjectId of the chunk.
- chunks.files_id
We can specify the _id of the parent document in the files collection.
- chunks.n
Sequence number of the chunk. The numbering starts from 0.
- chunks.data
The chunk’s payload as a BSON Binary type.
ii. The files collection
Documents in this collection have following form:
{ "_id" : <ObjectId>, "length" : <num>, "chunkSize" : <num>, "uploadDate" : <timestamp>, "md5" : <hash>, "filename" : <string>, "contentType" : <string>, "aliases" : <string array>, "metadata" : <any>, }
A document from files collection contains following fields:
- files._id
It is a unique identifier for this document. The _id is of the same data type that you choose for original document.
- files.length
Size of document in bytes.
- files.chunkSize
The size of each chunk in bytes. The default size is 255 kB.
- files.uploadDate
It gives the date the document was first stored by GridFS.
- files.md5
An MD5 hash of the complete file returned by the filemd5 command. It is of string type.
- files.filename
It is optional. A human-readable name for the GridFS file.
- files.contentType
It is optional. A valid MIME type for the GridFS file.
- files.aliases
It is optional. An array of alias strings.
- files.metadata
It is optional. Here, we store the additional information and metadata filed can be of any data.
MongoDB GridFS Indexes
MOngoDB GridFS uses indexes on each of the chunks and files collections for efficiency. You can create any additional indexes as desired to suit your application’s needs.
- The chunks Index
- The file index
i. The chunks index
It uses a unique, compound index on the chunks collection using the files_id and n fields. It allows efficient retrieval of chunks, as shown in the below example:
db.fs.chunks.find( { files_id: myFileID } ).sort( { n: 1 } )
If by any chance this index does not exist, you can issue this operation to create it using mongo shell:
db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } );
ii. The files index
It uses an index on the files collection using filename and UploadDate fields. It allows for efficient retrieval of files, as shown in the example below:
db.fs.files.find( { filename: myFileName } ).sort( { uploadDate: 1 } )
If by any chance this index does not exist, you can issue this operation to create it using mongo shell:
db.fs.files.createIndex( { filename: 1, uploadDate: 1 } );
MongoDB GridFS Sharding
As we had discussed earlier that GridFS consists of two collections named as files and chunks.
i. The chunks collection
To shard the chunks collections, use {files_id : 1, n : 1} or {files_id : 1} as the shard key index.
ii. The files collection
The files collection is small and only contains metadata. No keys of GridFS lend themselves to an even distribution in a sharded environment. Doing this allows all the file metadata documents to live on a primary shard.
How to Read and Write files in MongoDB GridFS?
For writing to a file in MongoDB using the following code in VScode.
//1. Load the mongoose driver var mongooseDrv = require("mongoose"); //2. Connect to MongoDB and its database mongooseDrv.connect('mongodb://localhost/filesDB', { useMongoClient: true }); //3. The Connection Object var connection = mongooseDrv.connection; if (connection !== "undefined") {     console.log(connection.readyState.toString());     //4. The Path object     var path = require("path");     //5. The grid-stream     var grid = require("gridfs-stream");     //6. The File-System module     var fs = require("fs");     //7.Read the video/image file from the videoread folder     var filesrc = path.join(__dirname, "./filestoread/example.png");     //8. Establish connection between Mongo and GridFS     Grid.mongo = mongooseDrv.mongo;     //9.Open the connection and write file     connection.once("open", () => {         console.log("Connection Open");         var gridfs = grid(example.db);         if (gridfs) {             //9a. create a stream, this will be             //used to store file in database             var streamwrite = gridfs.createWriteStream({                 //the file will be stored with the name                 filename: "example.png"             });             //9b. create a readstream to read the file             //from the filestored folder             //and pipe into the database             fs.createReadStream(filesrc).pipe(streamwrite);             //9c. Complete the write operation             streamwrite.on("close", function (file) {                 console.log("Write written successfully in database");             });         } else {             console.log("Sorry No Grid FS Object");         }     }); } else {     console.log('Sorry not connected'); } console.log("done");
Here we are loading the mongoose driver. This is used to connect with the MongoDB database.
For reading, a file in MongoDB use the following code in VScode.
var mongooseDrv = require("mongoose"); var schema = mongooseDrv.Schema; mongooseDrv.connect('mongodb://localhost/filesDB', { useMongoClient: true }); var connection = mongooseDrv.connection; if (connection !== "undefined") { Â Â Â Â console.log(connection.readyState.toString()); Â Â Â Â var path = require("path"); Â Â Â Â var grid = require("gridfs-stream"); Â Â Â Â var fs = require("fs"); Â Â Â Â var videosrc = path.join(__dirname, "./filestowrite/celibration.mp4"); Â Â Â Â Grid.mongo = mongooseDrv.mongo; Â Â Â Â connection.once("open", () => { Â Â Â Â Â Â Â Â console.log("Connection Open"); Â Â Â Â Â Â Â Â var gridfs = grid(example.db); Â Â Â Â Â Â Â Â if (gridfs) { Â Â Â Â Â Â Â Â Â Â Â Â var fsstreamwrite = fs.createWriteStream( Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â path.join(__dirname, "./filestowrite/example.png") Â Â Â Â Â Â Â Â Â Â Â Â ); Â Â Â Â Â Â Â Â Â Â Â Â var readstream = gridfs.createReadStream({ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â filename: "example.png" Â Â Â Â Â Â Â Â Â Â Â Â }); Â Â Â Â Â Â Â Â Â Â Â Â readstream.pipe(fsstreamwrite); Â Â Â Â Â Â Â Â Â Â Â Â readstream.on("close", function (file) { Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â console.log("File Read successfully from database"); Â Â Â Â Â Â Â Â Â Â Â Â }); Â Â Â Â Â Â Â Â } else { Â Â Â Â Â Â Â Â Â Â Â Â console.log("Sorry No Grid FS Object"); Â Â Â Â Â Â Â Â } Â Â Â Â }); } else { Â Â Â Â console.log('Sorry not connected'); } console.log("done");
MongoDB GridFS Limitations
There are some limitations of MongoDB GridFS and they are as follows:
- Working Set
- Performance
- Atomic Upcate
i. Working Set
The database content for serving files can significantly churn your memory working set. If you do not want to disturb the working set, you must use another MongoDB server to serve your files.
ii. Performance
The performance of file serving will be slow when you are using a web server and filesystem. But the management benefits might be the reason for the slowdown.
iii. Atomic Update
GridFS does not provide an atomic update of a file. If this is necessary, then you will have to maintain multiple versions of your files and then pick the right version of the same.
Summary
Hence, we have studied all about GridFS in MongoDB with its modules, sharding, indexes with examples. In addition, we studied sub-types of indexes, sharding, and modules: file and chunks with their examples.
Hope, you are enjoying!
Still, have a query? Feel free to ask in the comment section.
Did we exceed your expectations?
If Yes, share your valuable feedback on Google