Data Engineers & Big Data Administrators
In today’s episode of Big Data Big Questions we tackle what the skills are needed for Big Data Administrators. Data Engineers wear many hats in Data Analytic workflows, one part software engineer and one part systems administrators. The Big Data Administrators are responsible for keeping Hadoop, Kafka, Ambari, and other frameworks running. Find out what other skills Big Data Administrators need in the video below.
Make sure to subscribe to my YouTube channel to never miss an episode of Big Data Big Questions.
Transcript – Skills Needed for Big Data Administrators
Hi, folks! Thomas Henson here, with thomashenson.com, and today is another episode of Big Data Big Questions. Today, I’m going to answer a user question about data administration, or in big data, what is that big data administrator’s role?
What are some of the tools that they use? How can you get involved? Find out more, right after this.
Welcome back. Today’s question is going to revolve all around the big data administrator, what that role is, what are some of the tools that they use? This question came in from my website. You can do Big Data Big Questions, go to thomashenson.com, click on Big Questions, submit a question there. Put them in the comments section here below, and then always, make sure that you’re subscribing to this YouTube channel, so that you’ll never miss an episode. These are great tips. These are great ways for me to answer any questions that you have. If you have those questions, ask them, but also make sure you’re subscribing to the channel.
Today’s question comes in from Jarvis. He says he has a dilemma on Python for big data. We answered a number of questions around Python and big data, and then do you have to know Java? But, this one is a little bit different. It’s going to cover the data administrator.
Hi Thomas, a big fan of yours.
Thanks for watching. Thanks for sending in the question.
I had a question related to IT careers and skills in big data. I wanted to know if Python is required only by data administrators, or can all things done by Java on big data be implemented using Python as well?
This question is really good. Like I said, we’ve talked a little bit about, do you have to know Java in order to be able to be a big data admin, be involved in big data, be a data engineer?
The answer is no. You can do things in Python, but I want to tackle the question from the perspective of, you’re asking about data administration, and so there are two different roles. We’ve talked about the data engineer versus the data scientists. The data engineer is the one who’s setting up the cluster, maybe doing some of the software development, running your Hive jobs, maybe even just the software developer, from if you’re writing Java jobs, if you’re writing your Spark jobs, but your data administrator, that’s a different role inside of that. We have two pieces of the spectrum. This side over here, this is more software development side generated, and on this side over here, let’s say that this is more of the administrator, or our systems engineer, the person who’s setting up and running the cluster. Maybe not doing the day-to-day coding but doing the administrating and running of the system. Think of that as your full stack developer.
Think about when you split up your systems admin, who’s setting up the stack, making sure the database is running, doing those tasks versus who’s running the… Whether it be PHP code or .NET code. What skills does a data administrator have to have?
I would say that, if we’re talking about being able to be involved in the community, and be involved in big data, you’re going to keying on HTFS, Ambari, Hive, Flume, and you’re going to have a lot of Linux skills. If you’re asking me, you want to get into data administration, you want to be an awesome data administrator in the big data ecosystem, do you have to know Java? No. Can everything be implemented in Python? Maybe, but you’re probably going to be doing more administrative tasks as far as setting up the cluster, understanding the operating system that Hadoop’s running on.
You’re maintaining more that Linux level, and the Hadoop ecosystem level, so if you’re using Hortonworks or you’re using Cloudera, how all those tools are integrating and talking to each other. I would focus more on not even so much the coding part, but as far as being able to set up that cluster. It’s going to vary, too. It’s going to vary in the role.
Some places, especially when you’re just starting out on big data, and you have a small team in your company, you’re going to be the software engineer and the data administrator, right? You might need to have a little more code.
If you’re going to a more seasoned team or a bigger team, you can actually have that role where you’re running the administration. My answer is, I wouldn’t worry so much about Python and Java, if that’s the role that you’re wanting.
The data administrator, I would worry about being able to integrate the tools. Be familiar with the tools, be familiar with how to set up, how to add notes, how to take notes down. How to set up secondary name nodes, so, being able to make sure that, when one name node goes down, the second, you can flip over to the second name node. Being able to back up the data. Making sure that we’re taking snapshots. All the kind of tasks that go into running the system, versus being able to write a MapReduce job. If you’re really keen on being a big data administrator, which, those are great roles, those are a lot of fun, you’re still hands on, but you’re not really having to write the jobs.
You’re checking out new tech, checking out new projects, to see, “Hey, am I going to be able to integrate this into our system,” or, “Man, you know, we’ve got two or three more nodes that are going to come online, so let’s make sure that we get those racked and stacked, and then, let’s make sure that we’re adding those to the cluster, too.”
A lot of cool things that you can do in that role. Most of them aren’t going to involve coding, so you’re not really going to have to worry about Java, you’re not going to have to worry about Python, as much as you would in the traditional data engineer, where you’re looking at being more of a software engineer.
I hope I answered your question. If anybody else has any questions, put them in the comments section here below. Make sure to follow me here, so click subscribe, and then I’ll see you next time.