HDFS Skills Without Java
In the world of Hadoop and Big Data HDFS is king. Data Engineers looking to boost their administrative skills first learn to navigate the Hadoop Distributed File System (HDFS) before jumping to more complex tasks. If Hadoop is written in Java does that require knowing Java Programming for HDFS. In this video I breakdown what HDFS is and how to learn it without needing to know Java. Find out more by watching this episode of Big Data Big Questions.
Transcript – Learn HDFS Without Java?
Hi folks, Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question came in from a live session. If you’re not familiar, I do a live session sometime on the weekends, and I’m thinking about incorporating another one. If you’d like to be a part of those, make sure you check it out. I’ll post those. Also, let me know if there’s a better time for me to do these. If you’d like to see maybe a Wednesday night or a Tuesday night episode, let me know. Put them in the comments section here below, and also if you have a question, go ahead and throw them down just like this one.
This one came out from my live session. One of the last questions I actually dropped off. As I dropped off, this question came in, so I wanted to make sure that I was getting this one done and out there. the question comes in. It’s can you learn HDFS without Java? This question is a little bit similar to some of the other ones that I answered and talked around Hadoop. MapReduce, and can you do Hadoop, or MapReduce, or Spark without Java? This one takes a twist a little bit on more of the administrator side. I feel like we’re talking about, we’ve discussed the difference between the Hadoop or developer, when we’re talking big data developer versus big data administrator. This one is more around the administration. I’ve said before, on the other ones, where can you do Spark? Can you do MapReduce without Java? It was always, hey, it depends. You absolutely can, but there might be an instance where you need to import or have somebody that’s already using that. For this one, no Java.
You’re cleared. You don’t have to worry about that, and one of the reasons is, if you think about it from an administrator perspective, really what we’re trying to do is, we’re trying to go through and be able to move data around, and understand some of the other tasks, like updates, what we’re doing. I did a whole course around HDFS from the command line. You can go through that course and never do anything around Java-related. It’s pretty cool to be able to go in and do that. Talk more configuration files like what we’re trying to do from that perspective. No worries. No need for java to be able to do HDFS. From a high level, let’s look at some HDFS commands and understand what we’re talking about whenever we’re saying, “Hey, no need for Java.” Then also, more of a need for Linux. If we look here from the command line, one of the things that you can do is go through and look at what we’re doing from an HDFS perspective. All these commands that we’re going to do are HDFS DFS commands. If you look at doing HDFS DFS, just to list out the files that you have here, you’ll use this HDFS DFS LS. This command will take you through, and it’ll show you everything that’s in a directory, right? We’re looking at files that we have in this directory here, and it’s really similar to what we would do if you just logged in to your favorite version of Linux and did LS from the command line.
A lot of these commands are all going to be the same. I actually have a course, like I said, that’ll dig through and go through all these different commands, but look at this command here, too. HDFS, DFS, MKDR. What do you think we’re doing here? If you have a background in Linux, you understand that we’re just making directories. Lastly, some of the things that you’ll also want to have from a Linux perspective that will help you in HDFS are these permissions. How can you be able to be ensured that Bob doesn’t have access to a file that he doesn’t need to have access to or that the HDFS user is allowing other users to be able to create files? That’s where we talk about permissions. Like I said, this is similar. What we do from a Linux perspective, but I have a course that’s all around this, if you’re interested in checking it out, but these are some of the commands and some of the skills that you’ll need to be an HDFS administrator. I’ve also got some other resources that I’ll put in the description here that’ll walk through some quick tutorials that you can walk through, and start using. All the commands that you need to know, like I said, it’s nothing that you need to recite. I actually created some of these blog posts that you’ll see, just because I couldn’t remember some of the commands. Like I was saying, mostly from a Linux perspective, but no need to worry. No need to jump in about, “Man, how am I going to learn Java if I want to be an HDFS administrator,” or start working in HDFS? Totally able to do that, and you can see it here just as simply as how we were able to jump in and do it. If you’re looking to be able to jump in and do some of the commands like we just showed, just go out and download one of the sandboxes or set up just a Hadoop environment on your own. This gives you the ability to play with it in your own lab and start building out some of those other requirements. Now, thanks for tuning in. Thanks for the question. If anybody has a question, make sure you put them in the comments section here below. I’ll try my best to answer these as we see on another episode of Big Data Big Questions.