Stuck trying to manipulate a string in Hadoop and don’t want to use Java?
No Problem use Pig’s built in String Functions.
Why Pig for ETL?
Using Apache Pig in Hadoop is a must for ETL transactions. Pig allows for developer to quickly write a Pig Script to transform data in Hadoop. In Pig the String Functions are shipped with Pig and learning them is a time saver for ETL. So whether you are trying to covert case in a string or use a regular expression to extract data the Pig String Functions has you covered.
What’s Covered?
In this series I will walk through using the String Functions in a quick 5 minutes tutorial broken down by each function. Each video will build off the previous function but it’s not essential to wathc in order. I wanted each video be able to stand alone for quick reference for each String Function.
All the source code and files can be found on my Pig Example Github page. So you can follow along through the tutorial or grab the code after watching. Feel free to use and abuse the code. As a developer sometimes it’s easier to have something to start with rather than a blank screen.
If you already have your Hadoop development environment then you are ready to start.
If you are just starting out with Hadoop and Pig you might want to start here to learn about Pig. I’ve written a lot of post and published a couple videos on getting started with Pig Latin. So you’ll want to be familiar with those as you step through this series.
Pig String Functions
- Pig String Functions #1 – The LOWER function in Pig converts a string or strings to lowercase.
- Pig String Functions #2 – The UPPER functions in Pig coverts string or strings to upper case. Upper Function
Hope you enjoyed this series. Let me know what you liked and anything you would like to see in the future. As always if you need help just ask.
Bonus Content: For more Pig Functions check out the Pig Eval Function Series.