Post by LEARN BIG DATA ONLINE on May 23, 2014 13:23:51 GMT
WHAT HAS LINUX TO DO WITH BIG DATA?
Different flavors of Linux and Big Data
For those who are new to Linux, there are more than 100 different distributions of it. Some of them like RHEL and Ubuntu are popular, while some of them go in the dark because of a variety of reasons. Few of them are specifically built with a specific purpose, like Ubuntu Studio is packaged with applications to work with different multimedia requirements.
Irrespective of so many Linux distributions, all of them almost have the same Linux kernel as the base and package different applications on top of the Linux core. Different companies and individuals work on the Linux Kernel in a collaborative fashion in an open source environment and share the benefits. Linux would not have been possible without such a collaborative environment.
The way the different Linux companies like Canonical and RedHat benefit from the collaborative work on the Linux platform, Big Data platform companies like Cloudera, HortonWorks, MapR, Microsoft, DataStax etc also work in the same fashion. Each of these Big Data companies work through the Apache Software Foundation (ASF) in a collaborative environment. Later, they take the code from the ASF, customize it to make some improvements and repackage it back.
The code for the different projects under the ASF are open source and anyone enough curiosity can view the code and figure out how it works. The code can also be modified to make improvements by adding new features, fixing bugs etc. The code for Hadoop can be got from here.
Companies around Big Data and Linux
As mentioned about there are 100s of distributions of Linux because of the open source nature of Linux. Same is the case with Hadoop and other Big Data frameworks, there are 100s of companies which have their own distribution of Hadoop and other Big Data frameworks. This is possible because of the open source nature of Linux and many of the Big Data frameworks.
So how do the different Big Data companies make revenue? Most of the Big Data companies give away a limited feature version of their platform for free and an additional features version platform with a commercial license. Also, they provide different levels of commercial support and training around Big Data. This is no any different from how the different Linux companies had been operating for years.
Do I need to learn Linux to get started with Big Data?
The question most of the Big Data aspirants have is, if knowledge of Linux is required to get started with Big Data. The answer is mostly towards a Yes than a No. Let’s look into why.
Most of the frameworks are developed for Linux with Windows being an afterthought. For example, Cloudera Impala runs only on Linux and not on Windows as of now. So, to get started with some of the latest Big Data tools knowledge of Linux is a must.
A while back Microsoft abandoned its distributed computing platform Dryad in favor of Hadoop to build HDInsight which runs on Windows platform. For this Microsoft has partnered with Hortonworks. So one can expect more and more Big Data deployments on Windows platform, but we need to wait and watch for the same. Here is an article from eWeek on why Linux is the best OS for Big Data.
One need not be an expert in Linux, but the basic knowledge around Linux is more than enough to use the Big Data frameworks (developer role). But, those who want to pursue as a Big Data administrator in-depth knowledge of Linux is a must. Check out WizIQ’s online course for more.