New developers in the Hadoop ecosystem often struggle to get involved because they think they need to learn Java. Where do Python and non-Java developers turn to when developing in the Hadoop eco-system? What are the Python options in Hadoop?
Learn about the Python options in Hadoop and where developers are finding resources to build Hadoop applications. Watch now to find out!
Video – Python Options in Hadoop
Transcript
(forgive any errors it was transcribed by a machine)
Hi I’m Thomas Henson with thomashenson.com and today is another episode of big data big questions today’s question is about Python options in Big Data find out more right after this
[Music]
So today’s question comes in from YouTube and if I pronounce your name wrong I’m sorry but it comes in from V gene drawn and his question is thanks for the video and thank you for watching for people who are self learners like me when we start learning to do or Apache Spark or Kafka all the examples that are available on the Internet are in Java also there’s a lack of material available for these tools using Python and so this question was posed back from when we were talking about do you have to know Java to be a big data developer and we said that you don’t have to there’s a ton of options out there to allow for you to abstract away some of the Java and so I want to break this question now in a couple different areas and a couple different parts and so the way I’m going to look at it first is if you’re just starting out in a new and in the ecosystem and you’re looking at Kafka and spark and do a lot of the examples as far as the code and the way that it’s written are going to be in Java now with spark there are some different options so spark has options for you know writing your spot jobs and Scala writing your spark jobs in Python and so it’s they have a really good documentation around that some of the other tools not so much but a lot of the tools that we’re talking about you don’t have to specifically know Java unless you want to contribute or do something outside the box two of those tools so for example they do if you want to use to do better the box to write your MapReduce jobs you’re probably going to have to do something in Java if you’re not using something like hi or something like Pig that’s going to abstract the way that and so when we use something like hive with something like Pig you’re able to do it in more of a sequel like syntax and so that kind of helps you abstract away but if you’re going to write say some custom functions in T and you want to take advantage of that most of those are going to be done your Java but there are options out there for Python as well there aren’t as many examples of those out there but there are examples out there that show you how to use those and how to write those user-defined functions in Pig for example in Python and some of the other ones you do have to dig around so it is a little bit hard when we’re talking about that but for somebody that’s just starting out it’s really an awesome opportunity just to be able to jump in start using pay or high or even just using Hadoop as the box and see some of those functions now as far as to do there are also ways to write MapReduce jobs using Python and there are a couple different options out there too but I will say and I agree with you majority of those examples that you’re going to find are going to be in Java you’re really going to have to dig around the c dos but there are options and there are ways to get around it now I will say 90% of what you’re doing when you’re just starting out – maybe even 100 % you’re not really going to need the writing need of those custom functions right you’re just trying to get a learn try to get a feel for how everything’s written and how you can start you know implementing this in your own Center and kind of you know just just doing doing your pieces now once you start getting into it a little bit a little bit further you might need to use some Java but like I said there’s still some options out there for Python and Scala and especially as we start to look at spark and I’m going to come back to the spark part now and talk about how their documentation shows you know all their examples are written with Scala and Java now the Python is still kind of being built out so the documentation there’s a lot of examples in Python they’re still there’s still a couple that need to be worked out there but if you’re doing anything with spark I mean that’s one that you can you know whether you’re a beginner with no job experience or you know seasoned Java veteran you can go in and you know start using spark look at the examples let you look at the documentation and pretty much never write any Java – now if we’re talking about contributing so this is kind of the big caveat around that so if we’re talking about contributing to these products that is totally correct you know most of those products are going to be written in Java and if you really want to be a part of you know what cop is doing or you know the source code would to do or spark you’re really going to be you know behind and that’s one thing we’re one area you’re really going to want to know to be able to do that well I hope answered your question make sure you subscribe so you never miss an episode also if you have any questions send them in send your questions in and I’ll try my best to answer them here on big data big questions thanks again [Music]