Introduction
Mongodb (from 'humongous') is a scalable, high performance, document-oriented database.
A MongoDB instance holds a set of collections. A collection holds a set of documents. A document is a set of key-value pairs. Documents have a dynamic schema.
Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.
Since the schema is dynamic (as opposed to other datasources we can connect to in the Elasticube manager which are static, this presents a unique challenge when bringing in data from MongoDB, which can result in build failures.
This post will explain how to approach such errors.
Symptom 1
“Index is out of range” error while importing data from mongo:
Or
“Memory allocation has failed…” error while importing data from mongo:
Cause
The MongoDB ODBC Driver renormalizes MongoDB data into a relational format, letting data analysts view native MongoDB data using a familiar SQL relational model. With the ODBC Driver, data analysts can see nested MongoDB data in virtual tables, now presented as modeled relational SQL data.
The current version of the driver preserves nested MongoDB document structure by translating it into hierarchical SQL schema of virtual tables. The driver was designed to support virtual tables with 'name' keys for subdocuments and numeric indices for arrays, but not for nested subdocuments for various technical reasons.
This mechanism did not exist in previous versions of the driver, however the “old” table definition, where everything was flattened into one table still remains.
As a result if you do not select a virtual table or if the document has large nested documents the ODBC driver will create a table with a large amount of columns that will cause the error above.
Solution
A general approach to this issue would be to decrease/minimize the amount of columns generated for that table. There are a few ways to achieve this:
1) If you did not select the virtual tables (containing _VT_ in their name), try selecting the virtual table instead:
In the screenshot above select the dashboards_vt_ tables instead of the dashboards table
2) If the error persists even while selecting a virtual table there are 2 options:
a) In the ODBC driver configuration, hide columns that are not necessary for the reporting.
i) Open up the ODBC Data Source Administrator from the windows start menu and configure the MongoDB driver:
ii) Select Schema definition and then Edit Schema file:
iii) Check the columns you want to hide, click save, then close the dialog
iv) Click on “Upload Metadata” to update the schema definition.
v) In the Elasticube manager, delete the previously generated table and bring in the newly generated table, with the less columns.
b) Alternatively, in the Elasticube manager, when you select the tables to bring in, instead of bringing in the table as is, edit the SQL query so that you bring in only those columns needed – for example:
Instructions on how to do so can be found here.
Rebuild Elasticube.
Symptom 2
“ERROR [HY000] [Simba][Support](50090) Conversion from … failed” while importing data from mongoDB.
Or you see a field in the MongoDB but that field does not exist in the table generated in the Elasticube Manager.
Cause
The MongoDB ODBC driver automatically generates the schema definition based on data sampling. Since the MongoDB schema is dynamic it is possible that documents that were not sampled had a different datatype or structure.
Solution
Better sampling will resolve this issue.
- Open up the ODBC Data Source Administrator through the windows start menu
- Select the Mongo driver and click on “Advanced Options”
- In the documents to sample, increase the number of documents to sample. You can also set it to 0 and sample all documents to get the most accurate schema, however this may take some time, especially if there are a lot or documents:
- Click OK
- Click Schema Definition
- Click Generate All (this may take some time if you sample all documents!)
- Click Upload Metadata
- Bring in the new table generated into the Elasticube
- Rebuild the Elasticube