The Proving Ground
In the first piece I wrote for Unwinnable, I mentioned that “during creation basically all games are really shitty for a really long time. It’s just that the good ones get better.” This time, I want to go into arguably the best way games get from shitty to unshitty: playtesting.
To lead, though, I need to make an important distinction. I’m specifically talking about “playtesting” here. Sometimes people freely interchange that term with “focus testing,” even when they mean playtesting. The distinction is essential. Focus testing is piling a bunch of people in a conference room, asking them what they want in a game, showing concept art and asking if they like it, etc. It’s design by committee writ large, and it’s toxic. Focus testing has been used to justify decisions from changing the gender of a game’s protagonist (I’m guessing you don’t need me to tell you in which direction) to cancelling entire games before a playable prototype even existed. It’s anti-creativity. Truth is, people can’t tell you they want something that doesn’t exist. People understand things only in the context of what they’ve already experienced. So if you ask them what they want, they’re just going to say they want more of something they already like.
[pullquote]Playtests can evaluate hypotheses, but they cannot make hypotheses for you.[/pullquote]
So if playtesting is not focus testing, what is it? Playtesting, and the data that comes from it, is a tool for understanding how your game is working. It doesn’t tell you what to do, nor is creating a “better” playtest result a goal in and of itself (more on that below). Playtesting is the scientific method applied to game design.
You have some hypothesis, and you run experiments to evaluate it. And those experiments include having someone sit down and play a build of the game. The key thing here is having a hypothesis. “Is this good?” is an unevaluable statement. It’s meaningless. What’s “good” to one person may be dull to another. But “Is it clear how the smoke bomb works?” or “Can you get through this area without killing anybody?” are testable hypotheses.
The actual means of playtesting depend mostly on your resources and how important your team broadly believes the effort is. If you’re a wee shop like Klei, it’s nothing more remarkable than me and my notebook sitting behind someone and scribbling furiously as he plays the game. If you have an infinite well of money and time like Valve and Bungie, you can invest in far more rigorous methodologies that include things like eye tracking, observation behind one-way mirrors, heat maps of player deaths being built into the game itself, etc.
With Mark of the Ninja we playtested the game extensively. Or at least extensively compared to both previous games I’d worked on and previous games Klei had developed. Project development began in March 2011, and while we’d done a little playtesting with developer friends, we started playtesting seriously in January 2012. Every week between January and up until our final bug-fixing dash before submitting the game for certification, we had two people come in every week.
The subjects themselves were pulled from Craigslist. I put up a simple post with a link to a Google Form that asked about availability and some favorite games. And on balance, most of the folks were helpful and not too weird. A touch more likely to be pierced or tattooed, or have a Mario mushroom patch on their backpack, perhaps. A few people came in quite well dressed, but I’m imagining (hoping?) they came straight from work.
One poor fellow, however… wow. We’re maybe 10 minutes into the playtest, and the room is quite dark (it’s a stealth game, after all) and I see something glistening in the light of the TV out of the corner of my eye. I turn my gaze as nonchalantly as possible and see this poor guy has some massive discharge creeping out of his nose down his face. And I have no idea what the protocol is now.
Several minutes pass and he hasn’t reacted. I’m not sure if he’s noticed. Should I call attention to it? Is this normal? He surely has to notice… right? But not once has he reacted in any way. Finally, I subtly text someone outside of the meeting room to please bring in a box of tissues. I just discreetly slide them onto the table and he grabs one and quietly mumbles something about “nasal polyps.” I felt so, so bad for the poor guy. I’m sure he was just being extremely polite and trying not to contaminate the controller he was using. Fortunately, that was the height of unusual interactions with playtesters.
The sessions took 90 minutes, with about 75 minutes of gameplay and a quick survey afterward. After that, the playtesters would be given a small honorarium and sent on their way. The survey consisted of some quantitative questions on a Likert scale and some qualitative short answers. After the survey, I’d talk with them about some things I’d seen during the playtest, some general question I’d asked everyone (“Was there anything you understood later you wish had been clearer earlier?”) and any other thoughts they had.
The key here was this was the only time I spoke with the participant. During the actual play sessions, I was totally mute. Sometimes it’s extremely painful to see a tester struggling on something and not say or do something to help them. But you won’t be in every player’s living room once the game ships. You allow them to suffer so the people who are paying you, rather than being paid, won’t have to. The only exceptions I make are 1) if something is behaving strangely simply due to a bug, I’ll explain that, and 2) if the participant is really, really stuck and it’s reached a point where it’s no longer a good use of either of our time. But in the entirety of Ninja‘s playtest, across dozens of participants, I didn’t do this more than two or three times.
It’s often brutally, gut-wrenchingly painful to see people struggle with something you thought was simple. Due to some poor camera placement and a bad entrance, one playtester spent about five minutes just trying to find the way into the very second level from the opening area. Especially as a designer, I find nearly every playtest to be an exercise in great shame. Every Tuesday and Thursday was another outing to the pillory. It’s humbling and invaluable. It keeps you honest.
Once you’ve designed something, try as you might to see it with fresh eyes, it’s not possible to forget what you know so intimately. And of course your game makes sense to you – you built the bloody thing! So to truly see the game with fresh eyes, you need to abuse someone else’s.
Understanding when to start playtesting is a challenge. Too early, and playtesters often have difficulty seeing past how rough the game is. Too late, and the game has congealed too much to make meaningful changes to it. At this point, my approach is to playtest with other developers or game-literate friends early on, people who can more easily see past the cardboard and baling twine your game currently consists of, and then switch to strangers as soon as things look and feel slightly better than garbage.
I hear other developers say that they got “a few sessions in” just before the game shipped and my mind is boggled. I can say without question, Mark of the Ninja would have been much poorer had we not playtested as much as we did.
There is a very real danger here, though. Playtesting is not a panacea and it is not a substitute for good design. Some fiefdoms of the industry have a certain metrics fetishization, and the mobile/social/Facebook crowd prays especially hard at this altar. Problem is, if you only care about what you can measure, you build games that are only measurable. Can you evaluate whether or not a game is interesting or has something substantive to say in a playtest? Of course not. Playtests can evaluate hypotheses, but they cannot make hypotheses for you.
And if your design decisions are evaluated solely by what makes the metrics number increase, you can (and very often do) end up finding a local maxima rather than something actually interesting. This fetishism is anti-innovation. If some studio is known for exhausting A/B testing, evaluating their designs and then rolling out some new design feature, many other studios will rotely copy that design. Rather than attempting to innovate or do something different, it is assumed some large studio tested a bunch of different options and this was the best of all of them, and thus it should be adopted wholesale. This ranges from how quickly these mechanics – daily energy, having two currencies (one earned, one bought), gifting for items – become completely ubiquitous to blatant copying of games under the presumption this doppelganger can then be tweaked and measured until it outperforms the original.
Playtesting as a method of evaluating game design decisions is a powerful tool, but it’s essential to remember it’s only that – a tool. It isn’t a goal in itself. It’s a humbling, brutal practice that’s the best, and often the only, way to understand how to make your game better. But we all know how the road to hell is paved. If playtest and analysis isn’t in service of something greater, it leads to an idolatry that produces shallow games with little purpose and even less to say. When successful and with the right purpose, though, it can be a thing of beauty. Kinda funny to think that the single best thing you can do to make your game better is to give it to someone, shut up and listen. I guarantee they have something important to tell you.