Mongo‎ > ‎

Maintaining the Footprint

Mongo databases can grow in size significantly.  As with any storage system, there must be some thought put towards discarding elements when they're no longer necessary.  How you do this all depends upon what you're keeping in storage.  Because of mongo's practice of preallocation, simply deleting documents from a collection will not free up disk space; deleting documents will create free space for storage in the collection, but deleting documents will not reduce the size of disk space used by the database files.

Perhaps an example will help illustrate; let's consider that I'm storing documents in a Mongo collection.  These documents are temporary in nature and can be discarded at any time after they've been processed.  But let's say that I'd like to keep them for a little while to generate statistics against them, and potentially use them in reports.  So what I've got is a big collection of temporary documents.  What I need is to devise a way to flush old documents after a period of time.

Consider that my documents all have an attribute "timeWhenStored."  This attribute has a date value that holds a timestamp referencing when the document was added to the collection.  I can use this attribute to dump out-dated documents.  Now review the following script; it's a mongo shell script written that will perform the maintenance.  It includes comments to explain what's happening.


// Set makeChanges to false to run the script without executing changes to the database.
var makeChanges = true
// Delete documents older than 90 days old.
var days = 90
// set one (or both) to false to disable authentication.
var username = "someUser"
var password = "aSecret"

var scale = function(bytes){
    //This function takes a byte value, and makes it a human-readable value.
    var units = ["B", "KB", "MB", "GB", "TB"]
    unit = 0
    while(bytes > 1024){
        bytes /= 1024
        unit++
    }
    bytes = Math.round(bytes*100) / 100
    return spaceSuffix(""+bytes, 7) + " " + units[unit]
}
var spaceSuffix = function(str, len){
    //This function puts spaces at the end of a string.
    while(str.length < len){
        str = str + " "
    }
    return str
}
var printStats = function(){

    var stats = db.stats()
    print(" - File Size : " + scale(stats.fileSize))
    print(" - Data Size : " + scale(stats.dataSize))
    print(" - Storage Size : " + scale(stats.storageSize))
    print(" - Index Size : " + scale(stats.indexSize))
    print(" ")
    print(" - Average Object Size: " + scale(stats.avgObjSize))
    print(" - objects : " + stats.objects)
}

if(username && password) db.auth(username, password)

print("Before maintenance, the stats are:")
printStats()

var now = new Date()
var then = new Date(now - (1000*60*60*24*days))
print("")
print("Performing maintenance.....")
print(" - It is now : " + now)
print(" - Removing objects older than: " + then)

//The following line deletes documents from our collection.
if(makeChanges) db.someCollection.remove({timeWhenStored: {'$lt' : then }})
now = new Date()
print(" ")
print(" - It is now : " + now)
print(" - old objects have been removed.")

if(makeChanges) {
    //compacting defragments the collection.
    db.runCommand({compact: 'someCollection'})
    //performing repairDatabase is what can free up disk space by reducing preallocation.
    db.repairDatabase()
}
now = new Date()
print(" ")
print(" - It is now : " + now)
print(" - The collection is compacted, and maintenance is complete.")

print("")
print("After maintenance, the stats are:")
printStats()

You can save the code above into a file.  Say it's called maintenance.js.  Then you'd want to run it periodically by calling: mongo dbName maintenance.js
Schedule it as a cron job, and you might not have to worry so much about your storage consuming too much space.

Comments