Documentation | Support
Skip to end of metadata
Go to start of metadata

Datameer-related questions

Q. How can I monitor the performance of Datameer?

A. See: Monitoring Hadoop and Datameer using Nagios

Q. Where are my Amazon EC2 and SCP credentials stored?

A. Security credentials specified through the Datameer UI are stored (encrypted) in the Datameer metadata store (normally MySQL database) which stores all information related to data stores, workbooks, etc. See: Monitoring Hadoop and Datameer using Nagios

Q. How can I open a workbook that seems corrupted anyway?

A. When a formula of a workbook gets corrupted (e.g. manual formula editing on the database level or a failed database migration), a workbook becomes inaccessible and you will see the message "Oops, an error occurred'. In this situation, try the following setting in the file conf/default.properties:

and restart Datameer application.
After this try to reopen your workbook again.

Setting strict-parsing to false results in ignoring the remaining part of a formula once an invalid character or token is detected during formula parsing. This affects validity of the result! You should re-enable strict-parsing as soon as you have fixed the incorrect formula.

Hadoop-related questions

Q. Where can I go to learn more about Hadoop?

A. See: Hadoop Tutorials and Extra Features

Q. How can I optimize my Hadoop installation for use with Datameer?

A. See: Hadoop Cluster Configuration Tips

Q. How can I choose the Job queue/pool to which Datameer submits jobs:

A:

  1. First determine the appropriate Java system property which selects job queues on your Hadoop cluster. This is based on your chosen Hadoop scheduler. If you are using Fair Scheduler, this is the Hadoop property mapred.fairscheduler.poolnameproperty, configured in conf/mapred-site.xml of your Hadoop installation.
  2. Set this property in the Datameer UI under Administration -> Hadoop Cluster -> Custom Property and set the value to the name of the pool which Datameer should use.

Q. How do I configure Datameer/Hadoop to use native compression?

A: When working with large data volumes, native compression can drastically improve the performance of a Hadoop cluster. There are multiple options for compression algorithms. Each have their benefits, e.g. GZIP is better in terms of disk space, LZO in terms of speed.

  1. First determine the best compression algorithm for your environment (see the "compression" topic, under Hadoop Cluster Configuration Tips)
  2. Install native compression libraries (platform-dependent) on BOTH the Hadoop cluster (<HADOOP_HOME>/lib/native/Linux-[i386-32 | amd64-64]) and the Datameer machine (<das install dir>/lib/native/Linux-[i386-32 | amd64-64] )
  3. Configure the codec-use as custom properties in the Hadoop Cluster config section in Datameer:

    mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
    mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
    

Q. How do I configure Datameer/Hadoop to use LZO native compression?

A: LZO provides a great ratio of CPU/compression, and is the algorithm of choice for applications with high data throughput. However, it requires an additional download and configuration steps, as described below.

  1. Do the steps described in How do I configure Datameer/Hadoop to use native compression?
  2. Copy the LZO Java library (see Using LZO compression) into <DAS_install_folder>/etc/custom-jars. This library will allow Datameer to access the native libraries. This will be done both by Datameer and by your Hadoop cluster at various times. Datameer will include this library in the Hadoop job-jar, but it is the administrator's responsibility to ensure all native libraries exist on the Hadoop cluster. Otherwise, jobs submitted by Datameer will fail.
  3. Configure the codec-availability as custom properties in the Hadoop Cluster config section in Datameer:

    io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec
    io.compression.codec.lzo.class=com.hadoop.compression.lzo.LzoCodec
    

Q. How do I configure Datameer/Hadoop to use Snappy native compression?

A: Snappy compression codec provides high speed compression with reasonable compression ratio. See original documentation at http://code.google.com/p/snappy/ for more details.

  1. For now CDH3u1 and newer versions are containing Snappy compression codec already. Following link https://ccp.cloudera.com/display/CDHDOC/Snappy+Installation contains the configuration instructions. In addition, Snappy will be integrated into Apache Hadoop versions 1.0.2 and 0.23 (https://issues.apache.org/jira/browse/HADOOP-7206)
  2. Using Clouderas distribution of Hadoop it is required to enable the codec inside Datameer application either in Hadoop Cluster settings or on per job basis. Please add the following settings therefor:
     

    io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec
    mapred.output.compress=true
    mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
    mapred.output.compression.type=BLOCK

    Make sure matching versions of Snappy are installed in Datameer and on your Hadoop cluster (e.g. verify via checksums from the library files). Version mismatch can result in erroneous behavior during codec loading or job execution.

Q. How can I use a custom Hive SerDe?

A: The classes for your Hive SerDe must be in the classpath of the Hive plug-in used to connect Datameer to Hive.  Datameer provides Hive plugins for each major version of Hive (e.g. 0.5 and 0.7). 

To add your custom SerDe to Datameer:

  1. Determine the version of Hive you're using (e.g. 0.7)

  2. Shutdown Datameer

  3. unzip <Datameer Install folder>/plugins/plugin-hive-<Hive Version>-<Datameer Version>.zip
  4. Add the JAR file of your SerDe to: /lib/compile and rezip.
  5. Remove the corresponding .md5 file, if it exists (e.g. plugin-hive-0_7-1.3.7.zip.md5)
  6. Restart Datameer

Q. Plugin registry fails to resolve dependency plugin-das-extension-points?

A: If you observe a message similar to

WARN [2011-07-13 17:58:11] (PluginRegistryImpl.java:374) - Missing dependency plugin-das-extension-points for plugin <XYZ>

please note, that plugin extension point needs to be changed in all Datameer plugins when moving to version 1.3 (or higher).

For a temporary solution, copy the old plugin file <Datameer old version>/plugins/plugin-das-extension-points-1.2.x.zip 1.2 to the new installation folder and restart the Datameer application. This allows the custom extensions to be loaded.

However, you should consider maintaining your plugin code. For versions 1.3 onwards change your plugin.xml to (removes requirements for plugin-das-extension-points):

Labels: