Troubleshooting and Reference Guide
Troubleshooting and Tuning
Debugging an Application
Debugging Techniques
Debug Logging
Injecting Write Operators
Validating Key Properties
Using a Debugger
Debugging Distributed Applications
Combining Logs
Executing Locally
Analyzing Application Performance
System Performance
Microsoft Windows
UNIX
JVM Performance
Garbage Collection and Memory Usage
Thread and Wait Monitoring
Application Performance
Monitoring API
Tuning Applications with JVM Settings
Command Line Reference
Using clustermgr
Command Line Usage: clustermgr
Options
Commands
Using dr
Command Line Usage: dr
DataFlow-specific Options
JVM-specific Options
Using publishlibs
Command Line Usage: publishlibs
Options
Setting Up Clusters
Before You Begin
Installing in a Hadoop Cluster
Setting Up Kerberos Authentication
Enabling Fair Sharing
Starting and Stopping the Cluster
Starting and Stopping Cluster Manager
Starting and Stopping Node Managers Using the Command Line Interface
Setting Up Automatic Startup
Starting and Stopping Node Managers Using the Admin GUI
Configuring the Cluster
Defining Temporary Storage
Controlling Executor Settings
Scheduling a Job
Logging the Cluster Processes
Daemon Logs
Executor Working Directory or Logs
Executor Log Retention
Bulk Configuration
Headless Configuration
Controlling the Job Master Location
Configuring Kerberos Authentication
Configuring Resource Fairness
Setting Class Cache
Controlling Cache Behavior
Monitoring the Cluster
Cluster Monitoring Using the Administrative GUI
Cluster Administration Examples
Cluster Summary
Nodes
Pending Jobs
Current Jobs
Recent Jobs
Cluster Monitoring Using the clustermgr CLI
Configuration File Reference
dr_env.sh
Environmental Settings
Third-party Modules
Available Modules
Available Functions
Using Aggregation Functions
Using Analytics Functions
Using Core Functions
Arithmetic Functions
Arithmetic.add(augend, addend)
Conditional Functions
Constant Reference Functions
Conversion Functions
Date and Time Functions
Field Reference Functions
Formatting Functions
List Functions
Map Functions
Math Functions
Predicate Functions
Statistics Functions
String Functions
Using Matching Functions
Similarity Functions
String Encoding Functions
Cluster Settings
Daemon Logging
Cluster Manager Settings
Executor Settings for Each Node
Job Settings
Using DRException Class
DataFlow Exceptions
Engine Configuration Settings
Engine Settings
Engine Settings and Types
Port Settings
Port Settings and Types
Sort Settings
Sort Settings
Remote Monitoring Settings
Remote Monitoring Settings
Overriding Job Settings
Setting Engine Configurations Using RushScript
Hadoop Module Configurations
Package Summary
DataFlow Modules
Dependency Hierarchy
Package Hierarchy
Glossary
Troubleshooting and Reference Guide
Glossary