Search This Blog

HDFS Commands examples

This article cover simple and useful HDFS commands examples. Apache Hadoop is data processing tool at web scale. Hadoop contains three modules :

HDFS                          ---   Data storage system

MAPREDUCE           ---    Data processing framework

YARN                         ---    Resource management system for Hadoop


We will discuss some commands to learn how to interact with Hadoop distributed File System (HDFS). All hdfs file system commands start with hdfs dfs.  Most of the hadoop distributions  (CDH, HDP) come with standard hdfs user.

1)

Change the current user  root to HDFS user. Mostly hdfs user will be password less user.









2).



Create a file on the local file system using cat command.




3).


Create a directory in HDFS using mkdir command.





4).


Upload local file helloworld to HDFS directory helloworld using put command.



5).


Check file is loaded into hdfs directory helloworld using ls command.







6).


Read hdfs file content using cat command.







7).


Rename HDFS file helloworld to helloworldfile using mv command.









8)

Copy helloworldfile to another hdfs directory using cp command.












9)

Check the size of a file in HDFS using du command.





10).


Check replication factor of a file in HDFS using ls command. Rounded number is the replication factor of the file.



11).

Change the replication factor of a file in HDFS using setrep command.The example below changes replication factor from 3 to 2.







3 comments:

  1. Could anyone please let me know what "hdfs dfs -get" command does ?

    ReplyDelete
  2. This article on HDFS commands and examples provides a practical introduction to Hadoop Distributed File System operations and big data management techniques. Learning HDFS commands is very important for students and professionals working with distributed storage, cluster environments, and large-scale data processing systems. Learners interested in similar implementation concepts can also explore Big Data Projects to understand how distributed computing and scalable analytics platforms are implemented.

    ReplyDelete
  3. Big data technologies increasingly depend on distributed file systems, fault-tolerant architectures, and efficient storage management for handling massive datasets across enterprise environments. Students looking to build advanced data processing applications can further refer to Cloud Computing Projects for ideas related to scalable infrastructure, distributed environments, and cloud-based data processing systems. This post provides a useful overview of commonly used HDFS commands and their practical applications.

    ReplyDelete