VectorH Data Loading Guidelines
Each node loads one or more input files. Each node stores data for one or more partitions. The maximum parallelism level used for reading and parsing the data is determined by the number of input files; the maximum parallelism level for generating compressed data blocks and writing those out is determined by the number of partitions.
Use these guidelines when loading data:
• Input files can be in HDFS, but do not have to be.
• Use ‑‑cluster option on the vwload command if data is stored in HDFS.
• Large tables should be partitioned and will be stored across all nodes.
• Partition key and the number of partitions are specified when a table is created and should be determined by the guidelines listed previously.
• COPY FROM can also be used, but is slower.