This is a little guide about applying techniques from games user research to larps (and roleplaying games more generally), based on my experience running larps, doing games user research, and the things I wish I’d done.
While a lot of the examples will focus on the kind of UK larps I attend, as well as remote games run during covid, it should hopefully be useful to other larp styles (American, Nordic, etc) and to people running other forms of roleplaying games.
In short because larps have a huge opportunity cost, a ton of writing and design and kit making for potentially a single session. If that session doesn’t go as the game runners intend then that's a lot of effort for a potentially pretty bad experience (for both players and game runners).
Larps also have a lot of moving parts: players, guides, rules, as well as less obvious things like the physical spaces they take place in. Issues with any one of these can have knock-on effects across the whole experience.
First I’m going to go over some different goals that testing might be trying to achieve, then techniques to meet those goals, then analysis of feedback, and then ending with a small section talking about gathering feedback while a game is running. Each section will give examples from the larps I’ve run (so enjoy my mistakes!)
A key thing to keep in mind is you can’t test everything, so you have to prioritise what you test. I find a useful model to borrow here is Able Gamer’s APX pyramid, which we can use to fit testing goals into three categories:
Access: How do players access the game? What Barriers exist for specific player groups? How can those barriers be removed?
Challenge: Where does friction exist in the game? Is that friction intentional or not? Is that friction proportional or not?
Experiences: What experience are you aiming for your players to have? Does everyone get that experience? If not, what experience are they having?
In general testing should start by targeting Access, then Challenge, and then look at the Experiences produced. However these are by no means strict walls, and many testing techniques cover multiple categories, for example Critical Facet tests focus on experiences but will also reveal Access issues.
Aimed at ensuring Access by identifying usability issues. These tests involve getting people outside of the game running team to play or read through important parts of the documents, particularly rules and setting information, to make sure that other people interpret the text in the way you intend. There are broadly two ways to structure this kind of test:
Give a group of around 7 people the document of interest and ask them to send in any questions or comments they have after reading it.
7 is a commonly cited number as this will show up 80-90% of issues, with diminishing returns of adding more people.
Case Study: Mech 2099 Character creation
Mech 2099 was a very silly 2 session turn based larp where the players played mech pilots, and had to move in a grid. The game got to its first mission and the players (and myself) all realised that there were two different ways of measuring range and diagonals: One measuring from where the player was, one measuring from the square they were facing.
While clarifying this seems like it should be relatively simple, but because people had spent a week with whichever interpretation they’d understood, it was a recurring issue in play. A Usability test would have caught this relatively easily and allowed it to be clarified before it had settled in players minds.
Give 1 person whatever document you want testing, gather their feedback and opinions on it, address their feedback, Give to a different person. Repeat.
This approach can be particularly useful early in working on a game, as each new tester identifies different issues from the last (or shows that you haven't fixed a previously identified issue). However R.I.T.E can be less effective for long documents or documents where there might be a wide variety of misinterpretations, for example setting guides, as it can lead you to addressing a symptom (i.e. avoiding a specific misinterpretation) rather than the root cause.
Parliament of Kingdoms was a discord game which ran for 10 weeks, each player managed a nation in a newly formed union, with 4 resources (Happiness, Iron, Food, Gold), each nation having a mix of three religions, and a system of upgrades for each player to manage and a Parliament system for player proposed laws and projects affecting all the kingdoms.
Releasing the rules, an immediate problem: Every player immediately gives feedback that the order of operations in a turn is unclear. Had I run a R.I.T.E test beforehand, this would probably have come up in the first run.
Similar to usability read-throughs, but instead asking people with expertise in topics which may come up in game. For some games this is embedded in the process, for example games about environmentalism who have an expert involved in the core design process, for others this is an unspoken dimension, designers making a setting based on their own culture (Typically Western Europe).
In addition to avoiding cultural appropriation and misrepresentation, sensitivity read-throughs can also be used to help with moderation of themes which are present and ensure that there aren’t unintentionally triggering/unpleasant situations being implied by the game world.
Additional note: Don’t use fascist symbols by accident
The far-right has co opted a large number of symbols, particularly from Nordic mythology. You should avoid using these symbols, as in the lightest case you might look like a Nazi, and in the worst case Fascists will turn up to your game. There are a few good resources for doing this check, for example the Global Extremist Symbols Database.
In this style of test you are aiming to understand the friction and experience of a critical part of your design in as close to real circumstances as possible. This is a later stage test, done as early as possible but after usability testing to make sure the system is parsable. These tests are the best way of identifying the experiences players have with a system.
One thing to bear in mind that the first time playing through any rules or roleplay brief will involve a lot of cognitive load, so while it can be tempting to critical facet complicated situations, it's important to build up to this over the course of a testing session and allow players to get used to whatever it is that you’re testing.
Seventh sun was a twice weekly game which ran for just under two years. It involved character progression, and had weekend sessions composed of adventures in which characters would fight and talk through sequences of multiple encounters. A key resource in Seventh Sun was called “Stress”, each character gains Stress when they use an ability and having too much Stress has negative consequences.
Seventh Sun ended up having 5 playtests, starting with 10 people in the first test and scaling up to 25 by the final test. (25-30 being the expected player + monster crew numbers for an adventure). We actually made a mistake in running these tests in that we didn’t perform usability read throughs first, this turned the first hour of each of the playtests into essentially an impromptu usability test as we had to clarify interactions. The early tests looked at the experience of playing a character, the later tests allowed us to look at how to structure encounters. The table below gives examples of the kind of issues found in each playtest
Test | Issues | Changes |
1. | There's no reason ever to stop using abilities, combat is noisy, complicated, and lethal. Support characters don’t have anything to do as everyone else is always doing things | Skills which reduce Stress removed, so Stress now acts as a limit on how much a character can do. |
2. | Healing system is too slow, and doesn’t work in the dark | Changed to no longer involve drawing beads from a bag. |
3. | Large power disparity between certain archetypes. Some archetypes allow for players to choose non-functional sets of skills. | Slowing down how quickly problem archetypes can cycle through their skills. Reducing options in character creation. |
4. | Various infinite loops which involve standing still for 20 seconds. Healing being tied to a single archetype pushes players away from it, rather than incentivising it. | A rule change to prevent support spells being cast on yourself addressed this. Healing given to more archetypes. |
5. | Guild Archetype had become boring. Monsters need some ranged or crowd control capability in order to pose a threat to the player party Healing system leads to a positive feedback loop of the adventuring party being either fine or all dead. | Changes made to make the Guilds feel more thematic. “Quick healing” added as a mechanic. |
Larps usually have some form of guidance on what participants should wear to portray their characters. These guides often breakdown kit by faction and range from the minimum to aspirational kit goals.
In Seventh Sun’s case, our Kit guide test was focused on checking that the different faction briefs were sufficiently distinct and achievable with the kit our players actually had.
It also identified an additional issue: one of the faction kit designs was overly restrictive in the silhouettes non-binary players felt like they could create within it (being forced into stereotypically masculine or feminine silhouettes).
As anyone familiar with cosplay knows, it's also worth encouraging your players to make similar checks on their own costumes. Can they actually sit down? How long does it take for them to put it on? What if it rains? etc.
This technique aims to gain an understanding of the physical space you’re using for your game. Broadly it involves walking around the rooms, or field you’re using.
For all kinds of spaces you’re looking at:
For indoor spaces you’re looking at:
For outdoor spaces you’re looking at:
Okay, you’ve run your playtest and had a bunch of feedback emails sent in, now what?
This section is going to go over some approaches to analysis of feedback which I’ve found helpful. The goal of analysis is to turn a complicated set of data into ideas you can understand, communicate, and act on.
Start by looking at what happened during the session and comparing it to what you expected to happen. In combat games this is relatively simple:
For social games this can be a bit more difficult to track in real time, so you may need to look at feedback direction, but you can examine:
Then combine this with the post-game feedback you’ve gathered from players. How do players feel about what happened? This can be as simple as “Did people have fun?” but ideally should be tied to your goals for the game: “Did players with a given role have the experience I intended?”
To answer these questions it's often handy to break down each player's feedback into small bites of information, each a pair of event and emotion. You can then group these across players. By emotion to understand how people are feeling about the game, or by event to understand how players feel about particular systems.
If you’ve got lots of feedback then a spreadsheet isn’t a bad idea at this point, and theres lots of formal methods for analysing feedback (content analysis, thematic analysis, etc), so if you’re already familiar with one of those, you can always use it in your analysis.
If you are running a multi-session game, the gathering feedback doesn’t have to end at session 1. Things will always play out in unexpected ways (“No game survives contact with the players”), and as well as potential calcification of in-game power/roleplay around social groups.
Feedback forms, as well as requesting feedback by email, are typical ways to approach during game feedback. One thing to consider in live game feedback gathering is the way social discussions can impact player feedback. This can lead to players connecting disparate issues, sometimes making it more difficult to identify root causes.
(See the introduction to Parliament of Kingdoms in the usability read-throughs case study above) 3 weeks into Parliament of Kingdoms I had some design hunches about things which needed to be adjusted to improve the game (Food being underutilised, Religion system being too opaque), so sent out a short feedback survey to see if the players felt the same:
The results were:
This showed an important way that feedback gathering augments design thinking: You as a game runner will understand a lot about your game and how players feel, but you don’t have a complete picture. In this case I’d identified 1 issue which was a high priority for players, but had missed 3 highly impactful issues. Which in a nice bonus helped solve each other, warfare was a good way to make food more interesting as a system.
Seventh Sun: Simon Barnes, Callum Deery, Tegwen Hammersley, Sam Jenkins,
Lizzie Vialls, A. Williams.
Mech 2099 & Parliament of Kingdoms: Callum Deery.
Able gamers APX pyramid:
https://accessible.games/accessible-player-experiences/
Designing with Dissonance, about running online larps:
https://jarps.net/journal/article/view/27
Player Limitations and Accessiblity in Larp by Beatrix Livesy-Stephens and BjØrn-morten Vang Gundersen
https://nordiclarp.org/2024/03/18/player-limitations-and-accessibility-in-larp/
Playtesting Venba, an interview by Steve Bromley:
https://gamesuserresearch.com/venba-playtesting-a-hit-narrative-cooking-game/
Did you like this post? Tell us
Leave a comment
Log in with your itch.io account to leave a comment.