The book contains an initial set of exercises for Parts II, III and IV, Below we provide further details about these exercises and propose additional exercises for the interested reader. The list of exercises will be constantly updated and populated to follow the current trends in game artificial intelligence. If you have suggestions for exercises we should include in the following list please contact us via email at gameaibook [ at ] gmail [ dot ] com.

Exercises for the Second Edition

The book contains an initial set of exercises for Chapters 6, 9, and 12. Below we provide further details about these exercises and propose additional exercises for the interested reader.

Chapter 6: Playing Games

Ms Pac-Man Agent

The Ms Pac-man vs Ghost Team competition is a contest held at several artificial intelligence and machine learning conferences around the world in which AI controllers for Ms Pac-man and the Ghost-team compete for the highest ranking. For this exercise, you will have to develop a number of Ms Pac-Man AI players to compete against the ghost team controllers included in the software package. This is a simulator entirely written in Java with a well-documented interface. While the selection of two to three different agents within a semester period has shown to be a good educational practice we will leave the final number of Ms Pac-Man agents to be developed for you (or your class instructor) to decide.

Our proposed exercise is as follows

Download the Ms Pac-Man Java-based framework. (NB. Use a Java IDE like Eclipse/Netbeans to open the framework and run the main class “Executor.java”)
Use two of AI methods covered in Chapter 2 of the book to implement two different Ms Pac-Man agents.
- Hint: You may wish to start from methods with available java templates such as Monte Carlo Tree Search, Multi-Layer Perceptron, Genetic Algorithm, and Q-learning and then proceed to try out hybrid algorithms such as neuroevolution. Remember to visit the discussion about representation and utility at the beginning of Chapter 2 and the Ms Pac-Man examples for each algorithm covered in that chapter.
Design a performance measure and compare the performance of the two algorithms
- Hint: What is good performance in Ms Pac-Man? Maximising the score, completing levels, a mix of the two? Try out different measures of performance and see how your algorithms compare.
Optional Python workflow (this project): ms_pacman_agent.ipynb

Chapter 9: Procedural Content Generation

Maze Generation

Develop and compare constructive and search-based methods for maze generation. This is a merged version of your legacy Chapter 4 exercise and the current project Chapter 9 exercise.

Start from the Unity Catlike Coding maze tutorial (legacy/original workflow).
Implement a constructive PCG method for maze generation.
Implement (or alternatively compare with) a search-based PCG method and discuss representation + fitness choices.
- Hint: When designing your fitness function, consider what makes a maze engaging to play. Solvability is a necessary condition, but not sufficient — a maze where the optimal path is very short, or where dead-ends dominate, may feel unsatisfying. You may wish to combine multiple criteria such as solution path length, branching factor, and dead-end density into a single weighted fitness score, and experiment with different weightings to see how they affect the character of the generated mazes.
Evaluate generators (e.g., expressivity analysis and/or player testing).
- Hint: What is good expressivity in maze generation? A generator that always produces the same style of maze has low expressive range. To measure this, sample a large number of mazes from each generator, compute two summary statistics per maze (e.g. solution length and dead-end ratio), and plot each maze as a point in this 2D space. The resulting expressive range chart immediately reveals whether your constructive and search-based methods cover different regions of the design space, and which generator produces more diverse outputs. If human playtesting is not feasible, consider using an A* agent as a proxy player and recording path length, number of decision points encountered, and backtracking rate as proxies for player experience.

Platformer Level Generation

The platformer level generation framework is based on the Infinite Mario Bros (Persson, 2008) framework which has been used as the main framework of the Mario AI (and later Platformer AI) Competition since 2010. The competition featured several different tracks including gameplay, learning, Turing test and level generation.

As a proposed exercise on procedural content generation you are requested to download the level generation framework and

Apply a constructive PCG method–from the ones covered in the book–for the generation of platformer levels (Section 4.3).
- Hint: A tile-based direct representation is the most straightforward starting point, but consider whether a higher-level indirect representation–such as a sequence of chunks or segments–might yield more structured and playable levels. Revisit the discussion on representation in Section 4.3 before committing to a design.
Apply a generate-and-test PCG method–from the ones covered in the book–for the same purpose (Section 4.3).
- Hint: What should your fitness function reward? Solvability (can an A* agent reach the goal?) is a necessary baseline, but consider also rewarding challenge, measured as the number of obstacles or enemies encountered along the critical path, and variety, measured as the diversity of tile types used. Experiment with different fitness function designs and observe how the character of your generated levels changes in response.
Evaluate the generators using one or more of the methods covered in Section 4.6.
- Hint: Try expressivity analysis as a first step: sample a large number of levels from each generator, compute two summary statistics per level (e.g. linearity and leniency as defined in Section 4.6), and plot each level as a point in this 2D feature space. The resulting expressive range chart reveals whether your two generators explore the same or different regions of the level design space, and which produces more diverse outputs.
Design a performance measure and compare the algorithms’ performance.
- Hint: If human playtesting is not feasible, consider using a lightweight A* agent as a proxy player and recording metrics such as completion rate, path length, and the number of jumps required. These proxy measures connect your evaluation directly to the player experience dimension discussed in Chapter 5.

Level Generation via RL

Explore reinforcement learning for level generation in PCGRL settings. A recent framework students may use to explore the idea that levels can be designed via RL processes is the OpenAI GYM environment for PCG via Reinforcement Learning available at gym-pcgrl. Students are tasked to pick any of the possible game genres and test the level generation capacity of RL across various reward functions, content representations and RL hyperparameters.

Our proposed exercise is as follows:

Pick one game genre supported by gym-pcgrl and familiarise yourself with the available representation options (narrow, turtle, or wide) provided by the framework.
- Hint: The choice of representation is not merely a technical detail — it determines what actions the RL agent can take at each step, and therefore shapes the kinds of levels it tends to produce. Try at least two representations and reflect on how the generated content differs between them.
Design a reward function that captures your notion of a good level for the chosen genre and train a policy using a standard RL algorithm such as PPO.
- Hint: A reward function that combines solvability, path length, and content diversity will generally produce more interesting results than one that optimises for a single criterion. Be explicit about the weighting you choose and whether you normalise each component.
Evaluate the levels generated by your trained policy using one or more of the methods covered in Section 4.6, and compare the effect of at least two different reward functions or RL hyperparameter settings on the quality and diversity of the output.
- Hint: Expressivity analysis is a natural first step here. Sample a large number of levels from each trained policy, compute two summary statistics per level, and compare the expressive range charts side by side. A policy that converges on a narrow region of the design space may be scoring well on reward but producing repetitive content — an important tension worth discussing.

PCG Benchmark

Evaluate generators using quality, diversity and controllability benchmarks. A new PCG framework allows the testing and comparison of generative methods across a variety of PCG problems in games. The PCG benchmark follows the design methodology of OpenAI Gym making it easy to use, test, and expand for new problems. Currently it supports more than 12 problems in content generation: from creating 2D levels and rule sets for simple arcade games, to creating isometric buildings, all the way to creating playable arcade dungeon crawler games. Students may pick one of the several PCG problems supported, select an appropriate PCG method to solve it and evaluate to which degree the generated content passes the quality, diversity, and controllability functions provided by the framework.

Our proposed exercise is as follows:

Select one PCG problem from the benchmark and implement two different generative methods to solve it, for example a search-based method and an RL-based method.
- Hint: Choose a problem where the three benchmark dimensions, quality, diversity, and controllability, are likely to pull in different directions. A generator that scores highly on quality may sacrifice diversity, and one that is highly controllable may produce lower-quality output on average. This tension is worth investigating explicitly.
Evaluate both generators against all three benchmark dimensions and report your results.
- Hint: Do not treat quality, diversity, and controllability as equally weighted in your discussion. Consider which dimension matters most for the specific content type you have chosen and argue your case. For instance, controllability may be critical in an educational game context where a designer needs to reliably produce levels of a target difficulty, while diversity may be more important in a rogue-like context where variety drives replayability.
Design a composite performance measure that combines all three benchmark dimensions into a single scalar and use it to determine which of your two generators performs better overall.
- Hint: Reflect on whether a single scalar is always the most informative summary. Visualising the three dimensions as a radar chart, one per generator, may communicate the trade-offs more clearly than any aggregate score.

Chapter 12: Player Modeling

SteamSpy Dataset

Use SteamSpy data for game analytics and player/game behavior studies. This keeps the legacy exercise intent while aligning links and workflow with the current project.

Use the SteamSpy API to collect attributes for a set of games.
- Hint: Not all attributes are equally informative for clustering. Playtime metrics, ownership counts, and user review scores tend to yield more meaningful structure than categorical fields such as developer or publisher. As a starting point, select a subset of numerical attributes and reflect on whether normalisation is necessary before applying your chosen algorithm.
Apply two unsupervised learning methods to identify clusters in the game space.
- Hint: Consider what a meaningful cluster looks like in this context. A cluster of games with high median playtime, low price, and strong review scores tells a different story than one with high ownership but low engagement. Interpreting your clusters substantively, rather than just reporting cluster assignments, is what makes this exercise valuable.
Choose a performance measure and compare algorithm outcomes.
- Hint: Internal validation metrics such as silhouette score or Davies-Bouldin index allow you to compare clustering quality without ground truth labels. Report these alongside a qualitative description of the clusters each algorithm finds, and discuss whether the two algorithms agree on the structure of the game space or reveal different groupings.

StarCraft: Brood War Repository

Mine replay data and build predictive models of player strategy.

StarCraft Brood War screenshot — StarCraft (Blizzard Entertainment, 1998) screenshot

Download one StarCraft replay dataset from the repository.
- Hint: The available datasets vary in size and attribute richness. Choose one that includes early-game build order information, as these attributes have been shown in prior work to be among the most predictive of a player’s overall strategy.
Implement two supervised learning models to predict player strategy from selected features.
- Hint: Feature selection is important here, as the datasets contain many attributes of varying relevance. Consider beginning with a small, interpretable feature set and expanding it systematically. Reflecting on which features your models rely on most heavily is as informative as the prediction accuracy itself.
Compare models and discuss feature relevance for strategy prediction.
- Hint: Beyond accuracy, consider reporting precision and recall per strategy class, as class imbalance is common in replay datasets where some strategies are played far more frequently than others. A model that achieves high overall accuracy by ignoring rare strategies is less useful than one that captures the full range of player behavior.

AGAIN Dataset

The Arousal video Game AnnotatIoN (AGAIN) dataset has been collected for the purposes of analyzing and modeling player experience in a general fashion: within different games of the same genres and across games of different genres. The dataset is the largest and most diverse publicly available affective dataset based on games, featuring over 1100 in-game videos with corresponding gameplay data from 9 different games, annotated for arousal from 124 participants in a first-person continuous fashion. Students can test any of the AI methods covered in this chapter to analyze player arousal across each of the 9 games available, within the 3 games of each genre (shooters, racing and platformers) and even across the 3 game genres. Player arousal models can be trained on the manually extracted features of each game that are available in the dataset or even on the in-game video footage available, or both.

Our proposed exercise is as follows:

Download the AGAIN dataset and select one or more games to focus on, choosing whether to model arousal within a single game, within a genre, or across genres.
- Hint: Starting within a single game is the most tractable entry point, as the feature distributions are more consistent. Extending your model across genres is a more ambitious task that raises interesting questions about what arousal signals generalise and what remains game-specific.
Train two supervised learning models to predict player arousal from the available features, using either the manually extracted gameplay features, the video features, or a combination of both.
- Hint: Arousal is a continuous, time-varying signal, so the choice of how to aggregate or segment it into prediction targets matters considerably. Reflect on whether you are predicting mean arousal per session, arousal at fixed time windows, or moment-to-moment changes, and discuss how this choice affects what your model actually learns.
Compare your models and discuss which input modality, gameplay features or visual features, contributes most to predictive accuracy.
- Hint: An ablation study that removes one modality at a time is a clean way to quantify each modality’s contribution. You may find that gameplay features alone are surprisingly competitive with video-based features, which has implications for the cost of data collection in applied settings.

Platformer Experience Dataset

The Platformer Experience Dataset is the first available game experience corpus that contains multiple modalities of data from players of Infinite Mario Bros, a variant of Super Mario Bros (Nintendo, 1985). The database can be used to capture aspects of player experience based on behavioral and visual recordings of platform game players. In addition, the database contains aspects of the game context such as level attributes, demographic data of the players and self-reported annotations of experience in two forms: ratings and ranks.

Reactions of Super Mario Bros players — Our proposed exercise is as follows:

Study and download the dataset and select one or more target affective or cognitive states to model as your output.

*Hint*: Note that both ratings and ranks are **ordinal** data by nature. Consider whether you will treat them as regression targets or discretise them into classes, and reflect on the implications of that choice for your evaluation metric.

Train **two supervised learning** models to predict both the ratings and the rank labels of experience.

*Hint*: As an initial step, consider using only the behavioral data as input to your model. Then investigate whether adding visual features or level attributes improves performance, and by how much. This incremental approach makes the contribution of each modality explicit and interpretable.

Compare your methods under chosen metrics and discuss modality and feature contributions.

*Hint*: Correlation-based metrics such as Spearman’s rank correlation are well suited to evaluating ordinal predictions and are widely used in the player experience modeling literature. Reporting these alongside standard accuracy or RMSE gives a more complete picture of model quality.

GameVibe Corpus

GameVibe Corpus

The GameVibe corpus is a multimodal affect dataset of viewer engagement for FPS game videos. GameVibe consists of 2 hours of high-quality audio and visual data from 30 different FPS games extracted and curated from publicly available “Let’s Play” videos on YouTube. Engagement labels are provided by 20 annotators in the form of unbounded, time-continuous signals. The corpus includes the raw videos used and the latents extracted using pre-trained foundation models for visuals such as VideoMAE and audio such as BEATS. The rich variety of FPS game stimuli encompasses multiple game modes, winning conditions, and art styles, offering a unique opportunity for studying engagement models across a wide variety of contexts within a single genre.

Our proposed exercise is as follows:

Download the GameVibe corpus and familiarise yourself with the available modalities: raw video, audio, and pre-extracted latent features from VideoMAE and BEATS.
- Hint: Working with the pre-extracted latents is a practical starting point if computational resources are limited, as training or fine-tuning foundation models from scratch is expensive. Reflect on what information may be lost by using pre-extracted features rather than end-to-end representations.
Train two models to predict viewer engagement from the available features, treating the continuous engagement signal as either a regression target or a discretised classification problem.
- Hint: Engagement is a time-continuous signal, so temporal modeling approaches such as recurrent networks or temporal convolutional networks are natural candidates. Consider how the granularity of your prediction window affects both model complexity and the interpretability of your results.
Compare the contribution of visual and audio modalities to engagement prediction, and discuss what your findings suggest about the nature of viewer engagement in FPS game videos.
- Hint: You may find that audio features are more predictive of moment-to-moment engagement shifts than visual features, particularly during high-action sequences. Consider whether this result generalises across the 30 games in the corpus or is specific to certain game modes or art styles.

Exercises for the First Edition

Dungeon Generation

The random digger agent contained in the repository.

This repository made by Antonios Liapis contains algorithms for digger agents and cellular automata which generate dungeons and caves. The algorithms included are presented in Chapter 4 of this book and are detailed in the Constructive Generation Methods for Dungeons and Levels chapter of the PCG book.

A proposed exercise for dungeon generation is as follows:

Go through the tutorial and download the different constructive methods contained in the repository.
Implement a solver-based method (see Section 4.3.2) that generates dungeons.
- Hint: The key questions you should consider and experiment with are as follows: Which constraints are appropriate for dungeon generation? Which parameters best describe (represent) the dungeon?
Once you design a performance measure, compare the performance of the solver-based generator against the digger agents and the cellular automata.

MiniDungeons Agent

MiniDungeons is a simple turn-based roguelike puzzle game, implemented by Holmgard, Liapis, Togelius and Yannakakis as a benchmark problem for modeling decision-making styles of human players. In every MiniDungeon level, the hero (controlled by the player) starts at the level’s entrance and must navigate to the level exit while collecting treasures, killing monsters and drinking healing potions. For this exercise, you will have to develop a number of AI players able to complete all the dungeons in the MiniDungeons simulator. The repository provided by Antonios Liapis is entirely written in Java and provides a barebones, ASCII version of the full MiniDungeons game which is straightforward to use and fast to simulate.

Our proposed exercise is as follows:

Hint: Both SimulationMode and CompetitionMode output a broad range of metrics. Reflect on what is good performance for your MiniDungeons agents, when comparing between your agents: kill more monsters, collect all the treasure, or survive to reach the exit? These are different decision-making priorities, which have also been explored in the research on Minidungeons into procedural personas.

Download the MiniDungeons Java-based framework. Use a Java IDE like Eclipse/Netbeans to load the sources and add the libraries.

The easiest way to test the project is to use the three main classes in the experiment package.

SimulationMode performs a number of simulations of a specific agent on one or more dungeons and reports the outcomes as metrics and heatmaps of each playthrough; these reports are also saved in a folder specified in the outputFolder variable.

DebugMode tests one simulation on one map, step by step, allowing users to see what the agent does in each action, showing the ASCII map and the number of the hero’s current HP. Additional debug information could also be included in this view as needed.

CompetitionMode is intended for testing how each agent specified in the controllerNames array fares against every other in a number of metrics such as treasures collected, monsters killed etc. This mode is intended for conference or classroom competitions where agents created by different users compete on one or more dimensions monitored by the system.

Use two AI methods covered in Chapter 2 of the book to implement two different MiniDungeon agents.

Hint: You may wish to start from methods with available java templates such as Best First Search, Monte Carlo Tree Search and Q-learning and then proceed to try out hybrid algorithms such as evolving neural networks. Implementation examples of Q-learning and neuroevolution for MiniDungeons are described here and here respectively. Remember to visit the discussion about representation and utility at the beginning of Chapter 2.

Compare the performance of your agents in SimulationMode, or perform a competition with all agents in the class through CompetitionMode.

Other Datasets

Beyond the specific datasets we mentioned previously, a growing community of researchers and practitioners of AI and games has put some effort soliciting datasets and resources for player behavior modeling. One of these notable efforts is the Awesome Game Datasets repository that aims to serve as a guide for anyone who wishes to study AI and data mining methods as applied to games. The repository contains a series of datasets (i.e., from over 70 game titles, at the time of writing), but also tools and materials for the researcher to build their own dataset.

Awesome Game Datasets — https://github.com/leomaurodesenv/game-datasets

A number of additional publicly available datasets are worth exploring for player modeling and game analytics work. The PowerWash Simulator longitudinal dataset pairs detailed in-game behavioral telemetry with psychological survey responses from over 11,000 players across 222 days of play, making it one of the most comprehensive open datasets for studying the relationship between player behavior and wellbeing.

PowerWash Simulator Player Behaviour Dataset — https://www.nature.com/articles/s41597-023-02530-3

For students interested in competitive multiplayer games, the GOSU.AI Dota 2 chat and replay dataset and the CS:GO Competitive Matchmaking dataset both provide rich behavioral logs suitable for strategy prediction and player profiling tasks. These are accessible through the Awesome Game Datasets repository above.

Finally, students wishing to build their own synthetic game dataset for analytics and machine learning experiments may find the Players Behaviors Dataset Generator useful. The tool simulates session events, player churn, retention metrics, and spending behaviour, and is particularly well suited as a starting point for supervised learning tasks where real data is unavailable.

Players Behaviors Dataset Generator — https://github.com/awslabs/players-behaviors-dataset-generator