Shortcuts
Please wait while page loads.
LiberoBanner . Default .
PageMenu- Main Menu-
Page content

Catalogue Display

Big data analytics: A Handy reference guide for data analysts and data scientists to help to obtain value from big data analytics using Spark on Handoop clusters

Big data analytics: A Handy reference guide for data analysts and data scientists to help to obtain value from big data analytics using Spark on Handoop clusters
Item Information
Barcode Shelf Location Collection Volume Ref. Branch Status Due Date Res.
10028581 HD38.7.A55 2016
Applied Information Technology   GUtech Library . . Available .  
10028582 HD38.7.A55 2016
Applied Information Technology   GUtech Library . . Available .  
. Catalogue Record 12148 ItemInfo Beginning of record . Catalogue Record 12148 ItemInfo Top of page .
Catalogue Information
Field name Details
ISBN 9781785884696
Language Eng
Call Number HD38.7.A55 2016
Author Ankam, Venkat
Title Big data analytics : A Handy reference guide for data analysts and data scientists to help to obtain value from big data analytics using Spark on Handoop clusters
Publisher Birmingham : Packt Pub. , 2016
Description xv 300 p. : ill. ; 23 cm.
Specific Type of Material Book
Contents Big Data Analytics at a 10,000-Foot View; Big Data analytics and the role of Hadoop and Spark; A typical Big Data analytics project life cycle; Identifying the problem and outcomes; Identifying the necessary data; Data collection; Preprocessing data and ETL; Performing analytics; Visualizing data; The role of Hadoop and Spark; Big Data science and the role of Hadoop and Spark; A fundamental shift from data analytics to data science; Data scientists versus software engineers. Data scientists versus data analystsData scientists versus business analysts; A typical data science project life cycle; Hypothesis and modeling; Measuring the effectiveness; Making improvements; Communicating the results; The role of Hadoop and Spark; Tools and techniques; Real-life use cases; Summary; Chapter 2: Getting Started with Apache Hadoop and Apache Spark; Introducing Apache Hadoop; Hadoop Distributed File System; Features of HDFS; MapReduce; MapReduce features; MapReduce v1 versus MapReduce v2; MapReduce v1 challenges; YARN; Storage options on Hadoop; File formats. Compression formatsIntroducing Apache Spark; Spark history; What is Apache Spark?; What Apache Spark is not; MapReduce issues; Spark's stack; Why Hadoop plus Spark?; Hadoop features; Spark features; Frequently asked questions about Spark; Installing Hadoop plus Spark clusters; Summary; Chapter 3: Deep Dive into Apache Spark; Starting Spark daemons; Working with CDH; Working with HDP, MapR, and Spark pre-built packages; Learning Spark core concepts; Ways to work with Spark; Spark Shell; Spark applications; Resilient Distributed Dataset; Method 1 --parallelizing a collection. Method 2 --reading from a fileSpark context; Transformations and actions; Parallelism in RDDs; Lazy evaluation; Lineage Graph; Serialization; Leveraging Hadoop file formats in Spark; Data locality; Shared variables; Pair RDDs; Lifecycle of Spark program; Pipelining; Spark execution summary; Spark applications; Spark Shell versus Spark applications; Creating a Spark context; SparkConf; SparkSubmit; Spark Conf precedence order; Important application configurations; Persistence and caching; Storage levels; What level to choose?; Spark resource managers --Standalone, YARN, and Mesos. Local versus cluster modeCluster resource managers; Standalone; YARN; Mesos; Which resource manager to use?; Summary; Chapter 4: Big Data Analytics with Spark SQL, DataFrames, and Datasets; History of Spark SQL; Architecture of Spark SQL; Introducing SQL, Datasources, DataFrame, and Dataset APIs; Evolution of DataFrames and Datasets; What's wrong with RDDs?; RDD Transformations versus Dataset and DataFrames Transformations; Why Datasets and DataFrames?; Optimization; Speed; Automatic Schema Discovery; Multiple sources, multiple languages; Interoperability between RDDs and others.
Subject Big data -- Security measures
Links to Related Works
Subject References:
Authors:
Catalogue Information 12148 Beginning of record . Catalogue Information 12148 Top of page .

Reviews


This item has not been rated.    Add a Review and/or Rating12148