By James Phillips.
Online gaming has steadily grown over the past decade, now generating billions of dollars in annual revenue and representing one of the fastest growing sectors of the economy. In the last couple of years, social games have taken center stage, producing the vast majority of growth in the online gaming market.
If you are planning to build and launch a social game, growth is what you should be concerned with and prepared for. In large part due to their tie to Facebook, these games can accelerate from zero to millions of users literally overnight — Zynga’s CityVille game reached 100 million monthly active users within 40 days of its launch. Cost-effectively supporting that kind of growth, while sustaining a snappy and compelling gaming experience, presents an enormous challenge at every layer of the game’s technology stack.
On the flipside, many games tend to peak and then wane over time. As important as it is to be able to absorb new users during the growth phase of a game, it is equally important to be able to dial back resources (and therefore cost) as the game’s popularity declines.
MANAGING SOCIAL GAME DATA
The database layer presents a particular challenge for these games, as traditional approaches to data management tend to fall short in these environments. This is a new and vibrant area of technology innovation. Three key attributes characterize the data layer of a social game that is prepared for success:
- Elasticity: Matching infrastructure costs to demand optimizes a game’s profitability. The ability to easily dial up (and dial down) database resources is a critical part of that equation. One should be able to make these capacity changes to a live game so there is never a need to take a game offline, maintaining continuous revenue generation.
- Low latency: Interactive games must be responsive. Making a player wait for feedback leads to abandonment. If the experience is not quick and predictable, users leave … and take their entertainment budget with them. Database technologies must be able to consistently deliver sub-millisecond random reads and writes of data, across the entire scaling spectrum.
- Data format flexibility: The best social games adapt, delaying or preventing boredom and the resulting decline in active user count. The data tier must be flexible enough (even at very large scale, and without downtime) to support the changing data management requirements of a game in transition.
These are hard problems to solve at social gaming scale. To meet these needs, a new class of database — the NoSQL database — has garnered a lot of attention in the last couple of years. New open source, NoSQL databases provide the kind of performance and flexibility required of a social game database. If you are preparing for social gaming success, they are worthy of consideration.
CHOOSING THE RIGHT NOSQL DATABASE
Selecting the right NoSQL database can be difficult. It seems like a new NoSQL database project appears every week. Sorting through the options can be daunting. There are various classes of NoSQL database: key-value,
document, graph, columnar. Each data model has pros and cons.
Which is right for a social game? There is a lot of talk about “Big Data” in addition to NoSQL. Are these the same thing?
Let’s sort through these questions, in reverse order:
Big Data vs. Big Audience
There are two fundamental problems being addressed at the data layer today.
- Big Data. Data is being generated at an unprecedented rate. How can you efficiently analyze these extremely large datasets and identify patterns, trends and opportunities? This is the “Big Data” problem. Technologies like Hadoop, Map-Reduce and Cassandra are solutions built for analyzing very large datasets. They are generally batch-oriented and focused on analysis.
- Big Audience. Social games have user counts measured in the millions. Millions of users put tremendous pressure on a database — regardless of the size of the dataset. Even with only a few bytes per user (and thus a fairly small aggregate dataset size), keeping up with a non-stop stream of random reads and writes from a large number of concurrent users is incredibly hard. This is the Big Audience problem and what NoSQL databases are designed to address.
Of course, if you have a Big Audience, you are probably going to generate Big Data. And most social games deploy both a transactional NoSQL database for real-time data serving to the application and a BigData solution for data analysis.
Classes of NoSQL Database
The term “NoSQL” database is an unfortunate choice. More accurate would be “non-relational,” transactional database. This is the consistent characteristic across these “NoSQL” databases (some of which, confusingly, do support at least a subset of SQL). So if these solutions are not relational, what are they?
There are a number of data models: key-value, document, column-oriented and graph to name the most common. Each model has pros and cons making them more or less appropriate for a given application. Document-oriented databases power the majority of NoSQL deployments behind social games, largely due to their balance of four key criteria:
- Performance. The document data model keeps related data in a single physical location in memory and on disk (a document). This allows consistently low-latency access to the data — reads and writes happen with very little delay. Database latency can result in perceived “lag” by the player of a game and avoiding it is a key success criterion.
- Dynamic elasticity. Because the document approach keeps records “in one place” (a single document in a contiguous physical location), it is much easier to move the data from one server to another while maintaining consistency — and without requiring any game downtime. Moving data between servers is required to add and remove cluster capacity to cost-effectively match the aggregate performance needs of the application to the performance capability of the database. Doing this at any time without stopping the revenue flow of the game can make a material difference in game profitability.
- Schema flexibility. While all NoSQL databases provide schema flexibility, key-value and document-oriented databases enjoy the most flexibility. Column-oriented databases still require maintenance to add new columns and to group them. A key-value or document-oriented database requires no database maintenance to change the database schema (to add and remove “fields” or data elements from a given record).
- Query flexibility. Balancing schema flexibility with query expressiveness (the ability to ask the database questions, for example, “return me a list of all the farms in which a player purchased a black sheep last month”) is important. While a key-value database is completely flexible, allowing a user to put any desired value in the “value” part of the key-value pair, it doesn’t provide the ability to ask questions. It only permits accessing the data record associated with a given key. I can ask for the farm data for user A, B and C to see if they have a black sheep, but I can’t ask the database to do that work on my behalf. Document-databases provide the best balance of schema flexibility without giving up the ability to do sophisticated queries.
Which Option Is Right for Your Game?
If you agree that a document-oriented approach is correct, then you’ve already substantially reduced the number of contenders. If you were previously considering Big Data and NoSQL as synonymous, you’ve further reduced the set. From there, you should consider the important attributes we previously identified: elasticity, concurrent random read latency and throughput, and data format flexibility.
Additionally, one must consider the ease with which developers can build applications that interact with the database. Are there well-maintained and documented SDKs/client libraries? Is there a community of users to provide support and guidance? Is the technology being actively developed, enhanced and improved? Can you get commercial support if desired?
If you are considering building a social game, you must consider the infrastructure requirements to support growth. Your choice of database technology is arguably the most important infrastructure component decision you will make.