Shortcuts
Top of page (Alt+0)
Page content (Alt+9)
Page menu (Alt+8)
Your browser does not support javascript, some WebOpac functionallity will not be available.
.
Default
.
PageMenu
-
Main Menu
-
Member Services
.
Purchase Suggestion
.
Exit Webopac
.
Search Menu
Simple Search
.
Advanced Search
.
Clear Search Sets
.
Refine Search Results
.
.
FOS Childrens Library
.
New Items Search
.
Bottom Menu
Help
About
.
Map
.
Exit Webopac
.
Languages
English
.
German
.
New Items Menu
New Items Search
.
New Items List
.
.................................
EBOOK CENTRAL
.
SCIENCE DIRECT (BUSINESS)
.
MASADER
.
UNWTO
.
SCOPUS
.
E-JOURNALS
.
DATABASE INFO. SYSTEM (DBIS)
.
LIBRARY WEBSITE
.
© LIBERO v6.4.1sp211215
Page content
You are here
:
Catalogue Tag Display
Catalogue Tag Display
MARC 21
Big data analytics: A Handy reference guide for data analysts and data scientists to help to obtain value from big data analytics using Spark on Handoop clusters
Tag
Description
020
$a9781785884696
041
$aEng
084
$aHD38.7.A55 2016
100
$aAnkam, Venkat
245
$aBig data analytics$bA Handy reference guide for data analysts and data scientists to help to obtain value from big data analytics using Spark on Handoop clusters$ht
260
$aBirmingham$bPackt Pub.$c2016
300
$ xv$a300 p. : ill. ; 23 cm.
307
$b
Book
505
$aBig Data Analytics at a 10,000-Foot View; Big Data analytics and the role of Hadoop and Spark; A typical Big Data analytics project life cycle; Identifying the problem and outcomes; Identifying the necessary data; Data collection; Preprocessing data and ETL; Performing analytics; Visualizing data; The role of Hadoop and Spark; Big Data science and the role of Hadoop and Spark; A fundamental shift from data analytics to data science; Data scientists versus software engineers. Data scientists versus data analystsData scientists versus business analysts; A typical data science project life cycle; Hypothesis and modeling; Measuring the effectiveness; Making improvements; Communicating the results; The role of Hadoop and Spark; Tools and techniques; Real-life use cases; Summary; Chapter 2: Getting Started with Apache Hadoop and Apache Spark; Introducing Apache Hadoop; Hadoop Distributed File System; Features of HDFS; MapReduce; MapReduce features; MapReduce v1 versus MapReduce v2; MapReduce v1 challenges; YARN; Storage options on Hadoop; File formats. Compression formatsIntroducing Apache Spark; Spark history; What is Apache Spark?; What Apache Spark is not; MapReduce issues; Spark's stack; Why Hadoop plus Spark?; Hadoop features; Spark features; Frequently asked questions about Spark; Installing Hadoop plus Spark clusters; Summary; Chapter 3: Deep Dive into Apache Spark; Starting Spark daemons; Working with CDH; Working with HDP, MapR, and Spark pre-built packages; Learning Spark core concepts; Ways to work with Spark; Spark Shell; Spark applications; Resilient Distributed Dataset; Method 1 --parallelizing a collection. Method 2 --reading from a fileSpark context; Transformations and actions; Parallelism in RDDs; Lazy evaluation; Lineage Graph; Serialization; Leveraging Hadoop file formats in Spark; Data locality; Shared variables; Pair RDDs; Lifecycle of Spark program; Pipelining; Spark execution summary; Spark applications; Spark Shell versus Spark applications; Creating a Spark context; SparkConf; SparkSubmit; Spark Conf precedence order; Important application configurations; Persistence and caching; Storage levels; What level to choose?; Spark resource managers --Standalone, YARN, and Mesos. Local versus cluster modeCluster resource managers; Standalone; YARN; Mesos; Which resource manager to use?; Summary; Chapter 4: Big Data Analytics with Spark SQL, DataFrames, and Datasets; History of Spark SQL; Architecture of Spark SQL; Introducing SQL, Datasources, DataFrame, and Dataset APIs; Evolution of DataFrames and Datasets; What's wrong with RDDs?; RDD Transformations versus Dataset and DataFrames Transformations; Why Datasets and DataFrames?; Optimization; Speed; Automatic Schema Discovery; Multiple sources, multiple languages; Interoperability between RDDs and others.
650
$aBig data -- Security measures