Research # 2 Name

Research # 2

Name: Hammad Qureshi

Course: Big Data Analytic

Teacher: Agha Saadat

College: Governor State University

Business problem :
Data is getting bigger and bigger ever since computer science revolutionized the Information
Technology Industry. There are many impediment occurs due to having an abundance of data
being used in a single business day of corporate companies. Ther efore, the first problem is that
where to store the data, and the second problem is how it can be efficient. Thus , data scientist
introduced a Big Data Hadoop into the market . It has two main core component s, Hadoop
Distributed File System (HDFS) and Map -reduce Framework. It mitigates sto rage system to
store data into HDFS and similarly M apR educe helps to retrieve data rapidly from the system.
MapR educe is a major component of Hadoop and it consist of two mai n function Key and Value ,
it help to optimized the performance to retrieve data from the system.
Informati on Technology Industries storage data is rapidly increasing ; therefore, there are severity
issue arises of storage of data. On the other hand, it is very expensive for companies to buy
storage devices and servers additionally . It cost s very expensive for business to purchase
additional storage devi ces very often. Finally, they came to the solution to use MapReduce
framework in order to store data wit h cost efficient way. It saves a lot of amount of money which
companies spent on each year .
Google and Fac e-book came up with data storage problem in 2001 when their data consumption
reaches to 21 to 30 Peta -bytes per year and that was the biggest business problem for them.
Therefore, they have decided to use MapRedu ce kind of Framework in order to resolve this
problem. So what they did is to developed an abstraction layer that splits the dat a flo w into two
main phases, Map phase and the R educe phas e and the same technique applies in the MapReduce
Framework.

Technical Solution :
Although , Google invented MapReduce Framework in 2004 and Yahoo started a project and
developed Hadoop open source project in 2007 . Furthermore, m any Information Technology
Companies are benefited from the invention of MapReduce Framework an d saving millions of
dolla rs by using it. In addition, it is efficient and faster on web browser interface to get the
request from the server and send responses back and forth . MapReduce framework can able to
resolve business problems as well as technical Solution. It performs grouping, s orting and
filtering operations, while Reduce function summarizes and aggregates the result, produced by
Map function. The result o f these two functions is a k ey and v alue pair, where the keys are
mapped to the values to reduce the processing. Map Reduce fra mework of Hadoop is based on
YARN architecture, which supports parallel processing of large data sets . The basic concept
be hind MapReduce is that the Map sends a query to various data -nodes for processing and
Reduce collects the result of these queries and output a single value .
Architectural Diagram is below:

MapReduce major aspect of Hadoop and conducts of two main functions are, responsible for
delegating work to the different nodes in the cluster and collect all the results from the query into
one cohesive answer . Thus, l arge files are split into blocks of equal size, which are distributed
across the cluster for storage. Because you always need to consider the failure of the computer in
a larger cluster, each block is stored multiple times usually three times on diffe rent computers. In
the implementation of MapReduce, the user applies an alternating succession
of map and reduce functions to the data. Parallel execution of these functions, and the difficulties
that occur in the process, are handled automatically by the framework. The iteration comprises
three phases such as map, shuffle, and reduce . Furthermore, t he main components of MapReduce
are Job -Tracker known as the master node , Task -Trackers known as the agents within each
clust er, with functions of their own, and last but not least Job -History -Server is deployed as
separate function, but a component that tracks jobs.
Technical Diagram of MapR educe Framework :

How Simply MapReduce split task below is the Diagram:

Similarly, m any c orporate companies saving millions in hardware c osts even it is a hug
multinational Information Technology corporate companies like Yahoo , Amazon, Google and etc.
The challenge for saving millions of dollars in hardware costs is both a necessity and a challenge
for upcoming t arget . It is s aid that more than 150 terabytes of machine data goes through their
data warehouse every day, by using MapReduce Framework Technology saved millions of dollars
of companies . In addition, it provides techni cal solution for the big volume of data which is being

increased rapidly due to market demand s by using MapReduce key and values function . Actually,
keys and values function perform into a system w ith very effective approach by split ting a task
into a small node s and can be search by calling keys from the values. This approach make an
application work faster and better when will have a big volume of data.
Statistics of last 16 years of Google Revenue is below:

Furthermore, a fter Google started using MapReduce framework approach to use its key and value
function into the application, it helps to reduce load from the system to retrieve data into the
server because it separated task into a nodes; therefore, process does no t take much memory and it
work faster and efficient as compare to other manual approach without using MapReduce
Fram ework.
Below is Google MapReduce d iagram :

However , Google switch ed to cloud Computing and change d MapRe duce framework into clou d
base cluster envi ronment in order to get rid of physical storage system. But they always ke ep
backup for their physical storage system in case of any immediate alert. Therefore, they still rely
on MapReduce Frame work on a side for any situation of emergency. In addit ion, Amazon Elastic
MapReduce also provide cluster that adapt dynamically to customer requirement. So now
compan ies like Google and Yahoo are using Ama zon Elastic MapReduce Cloud base approach
rather than Hadoop MapReduce framework because of cloud base approach to follow in the
feature and get rid of physical storage system to suppo rt all the time.
Conclus ion:
Google resolve d its business problem b y invented Map Reduce framework ever since data storage
is getting bigger and bigger day by day ; moreover, it is applying most advance approach of using
Amazon Cloud based Elastic MapReduce Framework due to the dem and of market. Similarly
other companies are also adopting the same techniques to use M apReduce Framework in order to
be cost efficient and faster in application services . It is very important to remain in the

information technology business to us e smart approach and not to follow always back dated
approach or technology; otherwise Information Technology Industry competi tor companies
could take over. Thus Google is the leading business tycoon in the Information Techn ology
industry because of its smart decision to remain updated their system according to the market
demand . Therefore, it came up with effective technical solution of MapR educe fram ework to use
it.

Resources Link Below:

http://www.admin -magazine.com/HPC/Articles/MapReduce -and -Hadoop
http://map -reduce.wikispaces.asu.edu/
https://hackernoon.c om