Produktbild: Professional Hadoop Solutions

Professional Hadoop Solutions

Fr. 65.90

inkl. gesetzl. MwSt., Versandkostenfrei


Beschreibung

Produktdetails

Einband

Taschenbuch

Erscheinungsdatum

23.09.2013

Verlag

John Wiley & Sons

Seitenzahl

504

Maße (L/B/H)

23.4/18.9/2.8 cm

Gewicht

857 g

Auflage

1. Auflage

Sprache

Englisch

ISBN

978-1-118-61193-7

Beschreibung

Produktdetails

Einband

Taschenbuch

Erscheinungsdatum

23.09.2013

Verlag

John Wiley & Sons

Seitenzahl

504

Maße (L/B/H)

23.4/18.9/2.8 cm

Gewicht

857 g

Auflage

1. Auflage

Sprache

Englisch

ISBN

978-1-118-61193-7

Herstelleradresse

Libri GmbH
Europaallee 1
36244 Bad Hersfeld
DE

Email: gpsr@libri.de

Kundinnen und Kunden meinen

0 Bewertungen

Informationen zu Bewertungen

Zur Abgabe einer Bewertung ist eine Anmeldung im Konto notwendig. Die Authentizität der Bewertungen wird von uns nicht überprüft. Wir behalten uns vor, Bewertungstexte, die unseren Richtlinien widersprechen, entsprechend zu kürzen oder zu löschen.

Die Bewertungen sind nach Format, Anzahl Sterne und Datum sortiert.

Verfassen Sie die erste Bewertung zu diesem Artikel

Helfen Sie anderen Kund*innen durch Ihre Meinung

Kundinnen und Kunden meinen

0 Bewertungen filtern

Die Leseprobe wird geladen.
  • Produktbild: Professional Hadoop Solutions
  • Introduction xvii

    Chapter 1: Big Data and the Hadoop Ecosystem 1

    Big Data Meets Hadoop 2

    Hadoop: Meeting the Big Data Challenge 3

    Data Science in the Business World 5

    The Hadoop Ecosystem 7

    Hadoop Core Components 7

    Hadoop Distributions 10

    Developing Enterprise Applications with Hadoop 12

    Summary 16

    Chapter 2: Storing Data in Hadoop 19

    HDFS 19

    HDFS Architecture 20

    Using HDFS Files 24

    Hadoop-Specific File Types 26

    HDFS Federation and High Availability 32

    HBase 34

    HBase Architecture 34

    HBase Schema Design 40

    Programming for HBase 42

    New HBase Features 50

    Combining HDFS and HBase for Effective Data Storage 53

    Using Apache Avro 53

    Managing Metadata with HCatalog 58

    Choosing an Appropriate Hadoop Data Organization for Your Applications 60

    Summary 62

    Chapter 3: Processing Your Data with MapReduce 63

    Getting to Know MapReduce 63

    MapReduce Execution Pipeline 65

    Runtime Coordination and Task Management in MapReduce 68

    Your First MapReduce Application 70

    Building and Executing MapReduce Programs 74

    Designing MapReduce Implementations 78

    Using MapReduce as a Framework for Parallel Processing 79

    Simple Data Processing with MapReduce 81

    Building Joins with MapReduce 82

    Building Iterative MapReduce Applications 88

    To MapReduce or Not to MapReduce? 94

    Common MapReduce Design Gotchas 95

    Summary 96

    Chapter 4: Customizing MapReduce Execution 97

    Controlling MapReduce Execution with InputFormat 98

    Implementing InputFormat for Compute-Intensive Applications 100

    Implementing InputFormat to Control the Number of Maps 106

    Implementing InputFormat for Multiple HBase Tables 112

    Reading Data Your Way with Custom RecordReaders 116

    Implementing a Queue-Based RecordReader 116

    Implementing RecordReader for XML Data 119

    Organizing Output Data with Custom Output Formats 123

    Implementing OutputFormat for Splitting MapReduce

    Job's Output into Multiple Directories 124

    Writing Data Your Way with Custom RecordWriters 133

    Implementing a RecordWriter to Produce Outputtar Files 133

    Optimizing Your MapReduce Execution with a Combiner 135

    Controlling Reducer Execution with Partitioners 139

    Implementing a Custom Partitioner for One-to-Many Joins 140

    Using Non-Java Code with Hadoop 143

    Pipes 143

    Hadoop Streaming 143

    Using JNI 144

    Summary 146

    Chapter 5: Building Reliable MapReduce Apps 147

    Unit Testing MapReduce Applications 147

    Testing Mappers 150

    Testing Reducers 151

    Integration Testing 152

    Local Application Testing with Eclipse 154

    Using Logging for Hadoop Testing 156

    Processing Applications Logs 160

    Reporting Metrics with Job Counters 162

    Defensive Programming in MapReduce 165

    Summary 166

    Chapter 6: Automating Data Processing with Oozie 167

    Getting to Know Oozie 168

    Oozie Workflow 170

    Executing Asynchronous Activities in Oozie Workflow 173

    Oozie Recovery Capabilities 179

    Oozie Workflow Job Life Cycle 180

    Oozie Coordinator 181

    Oozie Bundle 187

    Oozie Parameterization with Expression Language 191

    Workflow Functions 192

    Coordinator Functions 192

    Bundle Functions 193

    Other EL Functions 193

    Oozie Job Execution Model 193

    Accessing Oozie 197

    Oozie SLA 199

    Summary 203

    Chapter 7: Using Oozie 205

    Validating Information about Places Using Probes 206

    Designing Place Validation Based on Probes 207

    Designing Oozie Workflows 208

    Implementing Oozie Workflow Applications 211

    Implementing the Data Preparation Workflow 212

    Implementing Attendance Index and Cluster Strands

    Workflows 220

    Implementing Workflow Activities 222

    Populating the Execution Context from a java Action 223

    Using MapReduce Jobs in Oozie Workflows 223

    Implementing Oozie Coordinator Applications 226

    Implementing Oozie Bundle Applications 231

    Deploying, Testing, and Executing Oozie Applications 232

    Deploying Oozie Applications 232

    Using the Oozie CLI for Execution of an Oozie Application 234

    Passing Arguments to Oozie Jobs 237

    Using the Oozie Console to Get Information about Oozie

    Applications 240

    Getting to Know the Oozie Console Screens 240

    Getting Information about a Coordinator Job 245

    Summary 247

    Chapter 8: Advanced Oozie FEATURES 249

    Building Custom Oozie Workflow Actions 250

    Implementing a Custom Oozie Workflow Action 251

    Deploying Oozie Custom Workflow Actions 255

    Adding Dynamic Execution to Oozie Workflows 257

    Overall Implementation Approach 257

    A Machine Learning Model, Parameters, and Algorithm 261

    Defining a Workflow for an Iterative Process 262

    Dynamic Workflow Generation 265

    Using the Oozie Java API 268

    Using Uber Jars with Oozie Applications 272

    Data Ingestion Conveyer 276

    Summary 283

    Chapter 9: Real-Time Hadoop 285

    Real-Time Applications in the Real World 286

    Using HBase for Implementing Real-Time Applications 287

    Using HBase as a Picture Management System 289

    Using HBase as a Lucene Back End 296

    Using Specialized Real-Time Hadoop Query Systems 317

    Apache Drill 319

    Impala 320

    Comparing Real-Time Queries to MapReduce 323

    Using Hadoop-Based Event-Processing Systems 323

    HFlame 324

    Storm 326

    Comparing Event Processing to MapReduce 329

    Summary 330

    Chapter 10: Hadoop Security 331

    A Brief History: Understanding Hadoop Security Challenges 333

    Authentication 334

    Kerberos Authentication 334

    Delegated Security Credentials 344

    Authorization 350

    HDFS File Permissions 350

    Service-Level Authorization 354

    Job Authorization 356

    Oozie Authentication and Authorization 356

    Network Encryption 358

    Security Enhancements with Project Rhino 360

    HDFS Disk-Level Encryption 361

    Token-Based Authentication and Unified Authorization Framework 361

    HBase Cell-Level Security 362

    Putting it All Together -- Best Practices for Securing Hadoop 362

    Authentication 363

    Authorization 364

    Network Encryption 364

    Stay Tuned for Hadoop Enhancements 365

    Summary 365

    Chapter 11: Running Hadoop Applications on AWS 367

    Getting to Know AWS 368

    Options for Running Hadoop on AWS 369

    Custom Installation using EC2 Instances 369

    Elastic MapReduce 370

    Additional Considerations before Making Your Choice 370

    Understanding the EMR-Hadoop Relationship 370

    EMR Architecture 372

    Using S3 Storage 373

    Maximizing Your Use of EMR 374

    Utilizing CloudWatch and Other AWS Components 376

    Accessing and Using EMR 377

    Using AWS S3 383

    Understanding the Use of Buckets 383

    Content Browsing with the Console 386

    Programmatically Accessing Files in S3 387

    Using MapReduce to Upload Multiple Files to S3 397

    Automating EMR Job Flow Creation and Job Execution 399

    Orchestrating Job Execution in EMR 404

    Using Oozie on an EMR Cluster 404

    AWS Simple Workflow 407

    AWS Data Pipeline 408

    Summary 409

    Chapter 12: Building Enterprise Security Solutions for Hadoop Implementations 411

    Security Concerns for Enterprise Applications 412

    Authentication 414

    Authorization 414

    Confidentiality 415

    Integrity 415

    Auditing 416

    What Hadoop Security Doesn't Natively Provide for Enterprise Applications 416

    Data-Oriented Access Control 416

    Differential Privacy 417

    Encrypted Data at Rest 419

    Enterprise Security Integration 419

    Approaches for Securing Enterprise Applications Using Hadoop 419

    Access Control Protection with Accumulo 420

    Encryption at Rest 430

    Network Isolation and Separation Approaches 430

    Summary 434

    Chapter 13: Hadoop's Future 435

    Simplifying MapReduce Programming with DSLs 436

    What Are DSLs? 436

    DSLs for Hadoop 437

    Faster, More Scalable Processing 449

    Apache YARN 449

    Tez 452

    Security Enhancements 452

    Emerging Trends 453

    Summary 454

    APPENDIX : Useful Reading 455

    Index 463