What’s a Data Engineer Career Like in 2019?
Times change and keeping up with maintaining skills while managing day to day projects can be exhausting.
Should I learn Hive or Tensorflow?
Which is better Flink or Spark?
How as a Data Engineer will I focus on Containers?
Questions like these come up all the times when I speaking with aspiring and career focused Data Engineers. Find out my thoughts around skills and career outlook for Data Engineers in 2019 on this episode of Big Data Big Questions.
Transcript – Data Engineer in 2019
Hi folks, Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in around, what does data engineering in 2019 look like? What are some of the trends? What are some of the things that are going on? Has this question come in from a comments section here on YouTube, so if you have a question, make sure you put it in the comments section here below or reach out to me on thomashenson.com. And, I’ll discuss it in an orderly fashion as they come in, provided I have the time. I’ve been getting a ton of questions, so I really appreciate it. Thank you for this community here.
Today’s question, before we jump into it, I want to give you my three top trends to watch for in 2019. Before we did, I did want to credit with an article that they did for their 10 trends in big data. I talked about them on my YouTube live session, so if you’re ever around Saturday mornings, jump on. Throw me a question in the chat. Let’s get to cracking. I try to answer as many questions as I can there, and try to do that Saturday mornings. Jump in there.
The 10 here, you can check in the comments section here below, where I have some of the link to the [Inaudible 00:01:13] trend here. I’m going to read some of them real quick. The first one they said for the top 10 trends in 2019. Data management and [Inaudible 00:01:22]. They’re talking a little bit about ETL and how ETL’s not going away. I’ve said that for a while, but we did read an article not too long ago that’s saying, “Hey, you know, there’s some tools out there that are really going to make ETL kind of a thing of the past.” We’ll see. Hopefully, right?
I’m not for ETL, I just, man. Started out there, and it seemed like I was never going to get out of it. Number two, data siloes continue to proliferate. This goes into what we saw when Hadoop emerged as this huge, big data lake, where the data’s only going to exist there. We’ve been talking about it, especially on this channel, over the past few years where, hey, data has a lot of gravity to it. There’s going to be data out on your edge. There’s going to be data in the cloud. There’s going to be data still in core data centers.
The idea of a fluid data lake is a little bit more consolidated. You still have those main areas, but you still have to do analytics and place in some area. number three, streaming analytics has a breakout year. Talked about streaming analytics on this channel for the last couple years. Actually did a session about the future architectures of streaming analytics at the 2017, was it Hadoop Summit? They call it Data Works, now.
Data governance builds steam, talked about some of that here. Soft skills start to emerge as tech evolves. Just talking about the soft skills of understanding the business, talks about that with the book, the big data MBA here. Deep learning gets a little bit deeper. Hm. Have we talked about deep learning on this channel? Special K expands footprint. They’re talking about Kubernetes and what’s going on with the doctorization. Clouds are hard to ignore. New tech will emerge, talking about how Silicon Valley and a lot of open source, and closed source, tools have been emerged, and they don’t see that stopping anytime soon. Then, smart things everywhere. I’ve talked about those a good bit here, too.
Without further ado, let’s jump into my three trends for 2019. My three trends to watch for in 2019. The first one, deep learning and Hadoop. How are these ecosystems going to interact with each other? A lot of project out there have talked about it last year, around project hydrogen, submarines, another project, and NVIDIA’s Rapid. It’s all about being able to use GPU and also be able to use those deep learning libraries with data that’s in your Hadoop ecosystem or just for some ETL. That’s one of the things that NVIDIA Rapid’s… Maybe I should do a video just specifically on that. Watch that trend. Start watching what’s going on with TensorFlow and being able to use integrated in with Spark and some of your other tools that are more traditional in the Hadoop ecosystem. That was number one. Number two. Two? Yep. Number two, containerization of the world overtakes data engineering. Similar to what they were talking about it [Inaudible 00:04:11], with their trends, with Special K being special. I think the containerization, we’ve seen it a lot, a lot of announcements here lately with cloud native applications and cloud native experiences on the Cloudera side, and you even saw in Hadoop 3.0 where they were laying the groundwork to be able to containerize your Yarn, schedule your engine, and some of the other components there. We’re going to continue to see that, and that’s one skill that you’re going to be looking for. If you’re in data engineering right now, you want to know what’s coming up down the pipe for you, I would look into doing some things and getting more familiar with the containerization. That’s actually in my roadmap for the end of the year for me, to understand a little more around docker, and Kubernetes, and that whole ecosystem. That is a big trend we will see for data engineering. It’s not going to slow down. It’s been picking up steam a lot here lately, but it’s going to go full force. My third trend, thing that I’m looking for, for data engineers in 2019, streaming analytics. I was doing some research and looking around some IDC numbers around where we’re talking about from a data perspective. We’re gonna be, one of the interesting tidbits that they were talking about is how streaming analytics will take up anywhere from around 30% of all the analytics and things that are going on in Azure. Think about all these different devices bringing in data here by 2025. 30% of that’s going to have to be streaming analytics. That’s a huge number. There’s a number of tools out there that are helping to try to deal with what’s going on from a streaming analytics perspective.
We’ve got , we’ve got Kafka. On the cloud side we’ve got Kinesis. A lot of different tools. We had [Inaudible 00:05:42] on this channel here, but there’s a lot of tools in place, a lot of tools being created, because streaming analytics is a huge beast of data to handle. It’s a different kind of problem than what we’ve seen, and it’s only going to get worse as we start bringing in more data, more devices. Really cool opportunities for you as data engineers. Outside of my goals for 2019, if you’re looking for some things to jump into and some educational paths for yourself as a data engineer in 2019, I would look into those three trends. Deep learning, containerization, and then streaming analytics. That’s all I have for today. Make sure to subscribe and ring that bell so that you never miss an episode of Big Data Big Questions. Throw a comment in the comments section here below if you have any questions. If you like the video, if you hated it, just let me know how you feel about this, and I will see you next time on this episode of Big Data Big Questions.