Want to know one of the hardest part for me installing Hadoop with Ambari?
Setting up Passwordless ssh for all nodes so that Ambari Agent could do the install. Looking back it might be a trivial thing to get right, but at that time my Linux skills were lacking. Plus I had been cast into the Hadoop Administrator role after only being a Pig Developer after a month.
Having a background in Linux is very beneficial to excelling as Data Engineer or Hadoop Administrator. However if you have just been thrown into the role or looking to build your first cluster from scratch check out the video below on Setting Up Passwordless SHH for Ambari Agent.
Transcript – Setup Passwordless SSH for Ambari Agent
One step we need to do before installing Ambari is to set up passwordless SSH on our Ambari boxes. So, what we’re going to do is we’re actually going to generate a key on our master node and send those out to our data nodes. I wanted to caution you that this sounds very easy, and if you’re familiar with Linux and you’ve done this a couple of times, you understand that it might be trivial. But if it’s something you haven’t done before or if it’s something you haven’t done in a while, you want to make sure that you walk through this step. One of the reasons that you really want to walk through this before we install anything Ambari-related or Hadoop-related is because this is going to help us troubleshoot problems that we might have with permissions. So, if we know that this piece works, we can eliminate all the other problems.
No problem if you haven’t set it up before. We’re actually going to walk through that in the demo here. But first, let’s just look at it from an architectural perspective. So, what we’re going to do is, on our master node, we’re going to generate both a public and a private key. Then we’re going to share out that public key with all the data nodes, and what this is going to do is this is going to allow for the master node to log in via SSH with no password into data node 1, 2, and 3. So, since master node can actually login to these, we only have to install Ambari on the master node and then allow the master node to run all the installation on all the other nodes. You’ll see more of that once we get into installing Ambari and Ambari Agent, but just know that we have to have this public key working in order to have passwordless SSH.
The steps to walk through it are pretty easy. So, what we’re going to do is we’re going to login to our master node and we’re going to create a key. So, we’ll type in ssh-keygen. From there, it’ll generate the public and private key, and then we will copy the public key to data node 1, 2 and 3. Next, we’re going to add that key to the authorized list on all the data nodes. We’re going to test from our main node into data node 1, 2, and 3 just to make sure that our passwordless SSH works and that we can log in as root. Now let’s step through that in a demo.
Now we’re ready to set up passwordless SSH in our environment. So, in my environment, I have node one, which will be my master node, and I’m going to set up passwordless SSH on node 2, 3, and 4. But in this demo, we’re just going to walk through doing it on node 1 and node 2, and then we can just replicate it—the same process—on the other nodes.
So, the first thing we need to do is, on our master node or node 1, we’re going to generate our public key and our private key. So, ssh-keygen. We’re going to keep it defaulted to go into the .ssh folder. I’m not going to enter anything for my passphrase.
You can see there’s a random image and we can run an ll on our .ssh directory, and we see that we have both our public and our private key. Now what we need to do is we need to move that public key over to our data node 1, and then we’ll be able to login without using a password. So, I’m going to clear out the screen, and now what we’re going to do is we’re going to use scp and just move that public key over to node 2.
So, since we haven’t set up our passwordless SSH, it will prompt us for a password here. So, we got the transfer complete and now I’m going to login to node 2. Still haven’t used that password. If we run a quick ll, we can see we have our public key here, and now all we need to do is set up our .ssh directory and add this public key to the authorized keys. So, we’re going to make that directory, where it’s inside that directory, and we can see nothing is in it. Now it’s time to move that public key into this .ssh directory.
Then we have our public key, and now let’s just cat that file. We’re going to create an authorized keys. And we have two files here, so we have our public key, and then we’ve also written an authorized keys, which is going to be that public key. So, we’re going to exit out. As you can see, I’m back in node 1. So, now we should be able to just SSH in and not be prompted for a password. And you can see now we’re here in node 2.
So that’s how you set up your passwordless SSH. We’ll need to do this for all data nodes that we’re going to add to the cluster, and this will allow that, once we have Ambari installed on our main node, Ambari will be able to go and make changes to all the data nodes and do all the updates and upgrades all at one time so that you’re not having to manage each individual upgrade, each individual update.