In this HDFS tutorial, we are going to learn the remaining important and frequently used HDFS commands using CLI, with the help of which we will be able to perform HDFS file operations like copying a file, changing files permissions, viewing the file contents, changing files ownership, creating directories, etc. To learn more about world’s most reliable storage layer follow this HDFS introductory guide
Before interacting with HDFS you need to Deploy Hadoop follow this detailed tutorial to Install and configure Hadoop.
2. HDFS Commands using CLI
HDFS-CLI is an interactive command line shell that makes interacting with the Hadoop Distributed File System (HDFS). Hadoop file system shell commands are used to perform various Hadoop HDFS operations and in order to manage the files present on HDFS clusters. The frequently used HDFS commands using CLI are given below in this section with their usage, description, and example. All the Hadoop file system shell commands are invoked by the bin/hdfs script.
hadoop fs -find <path> ... <expression> ...
hadoop fs -find /user/dataflair/dir1/ -name sample -print
Finds all files that match the specified expression and performs all the actions to them which are selected. If no path is specified then defaults to the present working directory. If none of the expression is specified then defaults to -print.
hadoop fs -help
hadoop fs -help
It displays usage information for the commands entered by the user. A user should exclude the leading ‘-‘ character in cmd.
hadoop fs -setfattr -n name [-v value] | -x name <path>
hdfs dfs -setfattr -n user.myAttr -v myValue /user/dataflair/dir2/purchases.txt hdfs dfs -setfattr -n user.noValue /user/dataflair/dir2/purchases.txt hdfs dfs -setfattr -x user.myAttr /user/dataflair/dir2/purchases.txt
Sets an extended attribute name and value for a file or directory.
-b: It removes all but the base ACL entries. All the entries are retained for the user, group, and others for compatibility with permission bits.
-n name: It displays the extended attribute name.
-v value: It displays the extended attribute value. For the values, there are three different encoding methods. The argument value is the string inside if any argument is enclosed in double quotes. If before any argument there is 0x or 0X as a prefix, then it is considered as a hexadecimal number. If before any argument there is 0s or 0S, then it is considered as a base64 encoding.
-x name: It removes the extended attribute.
path: The file or directory.
hadoop fs -truncate [-w] <length> <paths>
hadoop fs -truncate 55 /user/dataflair/dir2/purchases.txt /user/dataflair/dir1/purchases.txt hadoop fs -truncate -w 127 /user/dataflair/dir2/purchases.txt
It truncates (shorts) all the files to a specified length that match the specified file pattern.
The -w flag requests that if necessary the command waits for block recovery to get completed. Without -w flag the file may remain unclosed for some time the process of recovery is going on. At this time the file cannot be reopened for append.
hadoop fs -usage command
hadoop fs -usage mkdir
Return the help for an individual command.
3. What’s Next
- Internals of HDFS Data Write Pipeline and File write execution flow
- Top 10 Frequently used HDFS Commands
- Configure Hive Metastore from derby to MySQL