Hadoop MapReduce framework provides Distributed Cache to caches files needed by the applications. It can cache read-only text files, archives, jar files etc.
First of all, an application which needs to use distributed cache to distribute a file should make sure that the files are available on URLs. Hence, URLs can be either hdfs:// or http://. Now, if the file is present on the hdfs:// or http://urls. Then, user mentions it to be cache file to distribute. This framework will copy the cache file on all the nodes before starting of tasks on those nodes. The files are only copied once per job. Applications should not modify those files.
By default size of distributed cache is 10 GB. We can adjust the size of distributed cache using local.cache.size.
To learn in detail about how to distribute the extra files in MapReduce job during run time follow: Distributed Cache