MongoDB, the NoSQL software program, has become enormously popular for managing databases. But with its popularity have come with problems specially in the unsupported versions of the software. A new Datavail white paper shares case stories involving some of the largest organizations using MongoDB. This blog post will highlight some of the best practices Datavail’s MongoDB Database Engineers and Administrators think are most important.
Different Flavors of MongoDB
MongoDB is open-source software supported by a community of users. It can be used unsupported, in the community version, or supported with the enterprise edition. It also comes in a hosted version called Atlas, which provides both storage and software in the cloud.
“MongoDB is popular for content management systems (CMS), which frequently contain a great variety of content often assembled on the fly, such as images, text, tables, and advertisements. It is popular for data hubs and other big-data operations. It’s an excellent system for mobile and social data management.” — “Why You Should Upgrade MongoDB Now,” a Datavail white paper
You can use the community version of the software for free, but then you’re responsible for patching and upgrading it yourself. If you use the community version, you should consider purchasing a support contract.
The enterprise version of MongoDB requires a license for each database. With the license comes support
MongoDB Atlas is a cloud version offering database-as-a-service for those who wish to host their databases. With Atlas, users are dependent upon the software running in the cloud. It’s a lot less flexible than being able to control your own configuration. However, it is appealing for smaller enterprises or for those looking for a backup system.
Best Practices for MongoDB Operations
Let’s take a look at what some of Datavail’s clients and DBAs think are best practices for operating MongoDB.
1. Assigning Roles and Responsibilities
Roles are assigned based on the expectations for the new configuration. End users of reports are often closely involved in configuration discussions, along with system administrators, data architects, and, of course, DBAs.
2. Preparing for a MongoDB Deployment
Storage engine solutions
- MongoDB can be used with a variety of storage solutions — on premises, in the cloud, or hybrid.
- A variety of pluggable storage engines can be configured to work with MongoDB. The default package includes WiredTiger and MMAPv1.
- MongoDB Enterprise Advanced edition includes support for encrypted storage engine and in-memory storage.
NoSQL databases do not impose a schema, which is one of their great strengths. They can easily merge data in a wide variety of different formats. But the format they get merged into becomes a schema and its design should be carefully considered.
Database architects and schema designers need to consider the kind and number of documents to be stored, how they are grouped into collections, how they will be indexed and validated. With MongoDB Compass, these rules can be turned into a visual schema that makes it easier and faster to run database queries.
Data lifescycle management
Using MongoDB Zones, DBAs can build tiered storage solutions that support the data lifecycle, with frequently-used data stored in memory, less-used data stored on the server, and archived data taken offline at the proper time.
Indexes are an important part of optimizing a database system. They can also take up a lot of space and need to be removed when no longer needed. Here are some of the best practices for modeling data as documents, below. The right approach depends on the objectives of your application.
- Store data as a single document. The entire database can then be retrieved for a single query. This is efficient, but there are size limitations.
- Avoid creating large documents. The maximum document size for MongoDB is 16MB. For files that are larger than 16MB, such as photos or videos, you need tools such as GridFS to break larger files into a series of smaller files. If the space allocated for a document is more than half the space available, it can’t be replicated in memory while an update is made. You need to monitor document growth and relocate documents when they get too large.
- Avoid long field names. Long field names increase the minimum amount of space a database requires. The limit for field names is 125 characters. For a similar reason, avoid queries involving low-cardinal fields, because they can result in reports of enormous size.
- Use MongoDB’s new tools to manage Indexes. The WiredTiger storage engine automatically compresses indexes. MongoDB Compass helps visualize a database so that opportunities to eliminate infrequently used indexes are easy to spot.
3. MongoDB Setup and Configuration
- Configuring the database. Use tools such as Chef and Puppet to provision MongoDB instances. The provisioning replica sets and sharded clusters can be automated using MongoDB’s Ops Manager and Cloud Manager.
- Upgrading the software. With proper provisioning of the storage engine, upgrades to MongoDB can be performed without any downtime.
- Data migration. MongoDB used mongoimport and mongoexport for moving data into and out of a database. For migration from one MongoDB to another, use the mongodump and mongorestore tools.
- Storage. MongoDB can make use of attached solid-state drives (SSDs) for lightning-fast performance as well as conventional hard disk storage. RAID-5 and RAID-6 are not recommended for a MongoDB installation. RAID-10 is preferred for better performance and fault tolerance.
4. Continuous Availability with MongoDB
Just as your power supply has a battery backup or series of backups, so your MongoDB software can be configured with data redundancy and system redundancies that guarantee continuous availability.
- Journaling. Journal entries are recovered when data is recovered, making for a durable database that can be restored quickly. With WiredTiger and MMAPv1, journals are compressed to keep storage space down.
- Data redundancies. MongoDB maintains multiple replicas of the database for instant failover in the event of a crash. While the failed dataset is repaired or replaced, MongoDB keeps operating. Replica data sets make it possible to upgrade without any downtime, and to conduct maintenance and tuning while the database is operating.
- Replication. MongoDB replica data sets can be employed across data centers to protect against human and natural disasters. DBAs can specify the level of persistence they want associated with data writes. The guarantee can be set at written, journaled, replicated, or even custom configurations for replication of specific data to specific replicas.
5. Scaling a MongoDB System
One of the benefits of using MongoDB is the ease of scaling your databases as your needs grow. The technique used is called sharding, by which MongoDB automatically balances data across shards, or replica data sets, so that there is always capacity and rarely a bottleneck.
There are many ways to shard a database that correspond to the way the data is stored and retrieved. Sharding by range organizes the database along a key value that is contained in all the documents in the database. Hash sharding uses an MD5 of the shard key value to determine the shard distribution. MongoDB Zones is a sharding program that makes it easier to see and manipulate sharding policies. MongoDB 3.4 has new helpers that work with Zones for better control over sharding.
Some things to watch for when scaling your system include the file sizes running up against the limits mentioned earlier, exceeding the RAM limitation of the system, the need for writing to shards can hit the disk I/O limits, and the need to be specific about the location of the shards being written to.
6. Managing MongoDB
The easiest way to manage your MongoDB system is through the Ops Manager using the on-premises version, the Cloud Manager when using the cloud version, or Mongo DB Enterprise Advanced, which uses both Cloud Manager and Ops Manager. You can manually maintain the database or combine manual tasks into automated procedures including:
- Scheduled backups
- Alert monitors
- Index rollouts
- Zone management
Monitoring. It’s important to establish performance baselines so your monitoring can let you know how actual performance is departing from what is expected or required. Ops Manager tracks over 100 metrics in easy-to-customize dashboards that tell you at a glance how your database is performing. Dashboards can be mapped to permissions, restricting visibility into data sets. Custom alerts can be designed to trigger warnings whenever performance is outside of a desired parameter.
Backup and recovery. You need a backup-and-recovery plan for your MongoDB systems to guarantee system uptime, that you don’t lose any data, and that you satisfy regulatory requirements for data backup and storage. With Ops Manager and Cloud Manager, your data is continuously backed up just seconds behind the master database. If there is a problem, you can restore to an exact point in time. The mongodump tool bundled with MongoDB can perform live backups, scheduled backups, or point-in-time backups.
7. Security for MongoDB
- Authentication. A challenge/response module is included with MongoDB and the Advanced Enterprise edition is compatible with many popular security packages including LDAP, Kerberos, and Windows Active Directory.
- Authorization. Permissions are assigned to users based on a highly granular list of data that each user may access. It’s possible no two users will see a document the same way, depending on the fields they are allowed to access. The Advanced Enterprise edition of MongoDB 3.4 contains even more features for generating alerts about possible compromises to the system and greater control over user access and permissions.
- Auditing. Audit trails are easy to create and customize. Functionality includes the ability to:
- Run a report of all the people accessing a specific document.
- Track changes to the database by user.
- Customize and store auditing tools so the frequently-run reports can be generated with just a few keystrokes.
- Encryption. Encrypt your data on the disk and on the network and use encrypted channels for data transfer. According to “MongoDB Operations Best Practices” “MongoDB supports a variety of encryption algorithms – the default is AES-256 (256 bit encryption) in CBIC mode. AES-256 in GCM mode is also supported. Encryption can be configured to meet FIPS 140-2 requirements.”
There are many good reasons for using MongoDB to manage your database and for upgrading to the latest version to keep your database secure and enable best practices. If you would like to find our how you might benefit from a MongoDB upgrade or managed services contract, we encourage you to contact Datavail today or download the white paper. With more than 800 database administrators worldwide, Datavail is the largest database services provider in North America. As a reliable provider of 24×7 managed services for applications, BI/Analytics, and databases, Datavail can support your organization, regardless of the build you’ve selected.
EPM applications help measure the business performance. This post will help you choose the best EPM solutions for your organization’s needs and objectives.
With serious financial penalties, SOX audits can be intimidating — but they don’t have to be. Find out how you can use Datavail’s software to automatically prove SOX compliance.