Efficient MapReduce Framework Using Summation

Nowadays, data is the vital aspect of different activities. Thus, vast data is generated each and every second everywhere like industry, academics, health care, social networking, etc. As a result, an intelligent data analysis tool is needed. MapReduce (M

  • PDF / 372,515 Bytes
  • 9 Pages / 439.37 x 666.142 pts Page_size
  • 0 Downloads / 224 Views

DOWNLOAD

REPORT


1 Introduction BigData can be defined as huge quantity of data, in which data is beyond the normal database software system tool to capture, analyze, and manage. The data is within the limits of three dimensions which are data volume, data variety, and data velocity [1]. Primary analysis data contains surveys, observations, and experiences; and secondary analysis data contains client information, business information reports, competitive and marketplace information, business information, and location information that contains mobile device information. Geospatial information and image information contains a video and satellite image and provides chain information containing rating and vendor catalogs, to store and process this information that is done by BigData. To process this variety of data, the velocity is incredibly necessary. The major challenge is not to store the big datasets in our systems, but, to retrieve and analyze the large data within the organizations, that too, for information stored in various machines at completely different locations [1]. Hadoop comes in a picture in these situations. Hadoop has been adopted by many people leading companies, for example, Yahoo!, Google, and Facebook along with various BigData programs, for example, machine learning, bioinformatics, and cybersecurity. Hadoop has the power to analyze the info very quickly and effectively. Hadoop works best on semistructured and unstructured data. Hadoop has MR and Hadoop distributed file system [2]. The HDFS can provide a storage for clusters, and once the info is stored within the HDFS then it breaks into number of small pieces and distributes those small items into number of servers that are present within the clusters, wherever each server stores S. Suryawanshi (B) · P. Kaushik Department of Computer Science and Engineering, Maulana Azad National Institute of Technology (MANIT), Bhopal, India e-mail: [email protected] P. Kaushik e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2019 R. K. Shukla et al. (eds.), Data, Engineering and Applications, https://doi.org/10.1007/978-981-13-6351-1_1

3

4

S. Suryawanshi and P. Kaushik

these small pieces of whole information set and then for each piece of information a copy stored on more than one server, this copied information set will be retrieved once the MR is process and within which one or a lot of Mapper or Reducer fails to process [3]. MR appeared as the preferred computing framework for large processing because of its uncomplicated programming model and the execution is done in parallel automatically. MR has two computational phases, particularly mapping and reducing, that is successively carried through many maps and reduce tasks unalike. The map reads input data and manages to create pairs depending on the input data. This pairs are the intermediary outputs within the native machine. Within the map phase, the tasks begin in parallel which generates pairs of intermediate data by the input splits. The pairs are kept on the native (local) machine and well ord