Using Data Visualization to Persuade
Data and Feature Selection
League of Legends
We tested our framework on a dataset of a MOBA game: League of Legends. League of legends is a match-based game developed by Riot Games, in which each player, i.e., user, controls a champion characterized by specific abilities and fight, together with other players, against a team of other players. The final goal is to defeat the opposing team in an arena. Each champion starts the match with a low strength level which increases by killing adversaries, helping members of the team in kills, i.e., assists, and performing other actions. During the match, each champion can be killed many times, i.e., number of deaths. The player can earn gold (i.e., the LoL currency) by performing some actions, such as killing or assisting in kills, and can use the earned gold to improve the abilities of the champion.
We collected the LoL data by means of the Riot Games API, (https://developer.riotgames.com/) which provides metadata related to each match, including players' performance, such as number of assists, kills, deaths, etc. Each player is identified by a unique label and each match is marked by temporal information, such as match datetime and duration.
The League of Legends Dataset
The dataset analyzed in the present work consists of 961 players, and the complete game history of their first 100 matches are played exclusively in one specific battle arena, i.e., the characteristic map in which LoL teams fight. We decided to focus on a specific battle arena to minimize the variability in players' behaviors (and their evolution) induced by different game scenarios. To this purpose, we selected the most popular LoL battle arena, namely the Summoner's Rift (map_id = 11): the largest map in LoL, composed by three lanes (paths) connecting the opponents' bases, jungles at the edges, and a central river. This is (by a large margin) the most played battle arena in the game; its choice provided us with a significant amount of players who played this scenario at least 100 times. We decided to set this threshold because we wanted to guarantee that a sufficient number of matches were played by each single individual to capture a pattern of temporal behavior evolution. One hundred matches resulted in a good trade-off between the number of users (nearly 1000) and the number of total matches (nearly 100K) yielded by the selected threshold.
Feature Selection
For each match, the Riot Games API returns 51 features (https://developer.riotgames.com/api-methods) (full list and explanations are provided in the Appendix A) associated with different aspects of the game. In summary, the API provides: IDs (e.g., map ID, player ID, match ID, etc.); temporal information related to the match; features related to minions (i.e., the AI-controlled characters that spawn in the game map); damages dealt and taken by players, gold (i.e., the currency of LoL, used to purchase items and champions' upgrades), all the different types of kills (such as killing champions, minions, towers, or other entities in the game); other types of actions, such as assisting, dying, healing, wards related actions; the binary feature "winner" which provides the final outcome of the game (win/lose); and some additional in-game detailed scores (cf., Appendix A). Not all these features are predictive of players' game performance, thus we first apply our feature selection step. To identify informative features in LoL, we use Decision Trees to predict whether a user had won a specific match, given the vector of all the features describing her/his performance in that match (as shown in Figure 1). Here, the target values are provided by the feature "winner", a binary feature which is 1 if the player won the game and 0 otherwise.
We then rank features, as described in Section 2, based on their Gini importance. The best model, which obtains a prediction accuracy above 80%, selects the following four features as cumulatively responsible for over 99% of the Gini importance: (1) number of assists; (2) number of kills; (3) number of deaths; and (4) gold earned. We retain these four features, normalize them as in Equation (2), and discard all the others in the rest of the analysis. The final dataset is available online in the Supplementary Materials: data are provided in a table format in which each line contains the id of a user playing in a certain match, and his/her number of actions.
We can finally build the tensor \(\mathscr X^{I×J×K}\) used for the analysis. This is a three-dimensional array where \(I=961\) users, \(J=4\) features, and \(K=100\) time steps, i.e., number of successive matches.