The Impact of Gamified Instruction on Students’ Learning Outcomes: Systematic Review of Experimental Studies

This systematic review critically explores the intervention design and findings of the experimental studies that were published between January 2012-December 2020 in a number of digital libraries and databases and had the effect of a gamified instruction on students’ learning outcomes in their focus, with the aim of identifying what constitutes success or the lack thereof in the given context. The found effect(s) of gamified instruction on students’ learning engagement and achievement are discussed in relation to the a) intervention design, its flaws and their potential impact on reported outcomes and b) prevalent practice in gamification research. The discussion is structured around data collection sources, sample size, and intervention duration, but also the characteristics of learning technology, learning approach, course content, type of games and game elements. This study proposes a list of categories to be included in the description of a study context so that it is possible to a) systematically organise research findings, b) filter the variety of findings via means of replication studies. c) recognise the variant effect on different sub-populations, and d) suggest the way forward when designing and implementing gamified instruction within specific conditions. Furthermore, the study highlights the necessity of approaching the topic through a mixed-method approach involving a more intensive tracking schedule with new assessment instruments and a larger number of participants that are longitudinal or at least of a longer duration in order to obtain more comprehensive findings.


Introduction
Gamification is a relatively new phenomenon that builds on established game-design principles and an understanding of human nature (Chou, 2016). Werbach and Hunter (2012) define game design elements as pyramidal hierarchies composed of three layers: components, mechanics and dynamics. Being grounded in behavioural economics and psychology, gamification allows designers to achieve intervention objectives (Chou, 2016;Wood & Reiners, 2015). Four main application fields are product gamification, workplace gamification, marketing gamification and lifestyle gamification (Chou, 2016). Correspondingly, the theoretical rationale behind a specific gamification design relies on field-specific theories and general psychology theories that explain human nature and behaviour in a given context.
Lifestyle gamification involves applying game-design principles into people's daily activities and habits, and one of the sectors where it branches is education (Chou, 2016). Unlike serious games (that are not designed for pure entertainment but for provoking a thought and/or delivering a message, developing skills, teaching a lesson, providing experience and emotions, or changing behaviour and attitudes (Nguyen, Ishmatova, Tapanainen, Liukkonen, Katajapuu, Makila, Luimula, 2017;Duin, Hauge, Hunecker, Thoben, 2011;, gamification is not designed to influence learning directly, but to alter learners' attitude or behaviour and improve already existing instruction as a consequence of this change in the attitude or behaviour (Landers, 2014;Koivisto & Hamari, 2019;Dichev & Dicheva, 2017).
According to the framework developed by Werbach and Hunter (2012), when gamifying learning instructions, the components that must be included in the 'game' (typically, points, badges, leaderboards (PBL), levels (Hamari, Koivisto & Sarsa 2014;Dichev & Dicheva, 2017;Zainuddin, Chu, Shujahat & Perera, 2020) need to be related to the intention and purpose of instruction, the needs of the target user group, but also the dynamics of a specific learning approach and capabilities of involved software tools. In gamification parlance, mechanics are concepts that define potential actions that the users can take while playing, especially guidelines that define how the game evolves, the possible reactions to an occurring event and what influences the users' reactions (feedback, collaboration, competition, etc.). Finally, specific game dynamics result from behaviours and interactions among users that are being stimulated by the components and mechanics described. The dynamic depends on each user's nature, personality and experience (Manzano-León, Camacho-Lazarraga, Guerrero, Guerrero-Puerta, Aguilar-Parra, Trigueros & Alias, 2021; Werbach & Hunter, 2012).
When gamifications goes beyond PBL, it follows a video-game design process (Manzano-León et al., 2021). The Mechanics, Dynamics, Aesthetics (MDA) model, developed by Hunicke, Le Blanc & Zubek (2004), builds game aesthetics, on top of previously described principles of game mechanics and dynamics. As such, aesthetics involves specific emotional experience resulting from gameplay, such as fantasy, narrative, challenge, discovery, etc. (Hunicke et al., 2004). Adding one informal rule by a group of users allows transforming gamification into a complete game .
Gamified systems that help users find their own reasons for engaging in targeted behaviour are known as meaningful gamification (Nicholson, 2014). This type of gamification is grounded in the self-determination theory of Ryan and Deci (2000), which implies that intrinsic motivation is driven by autonomy, competence or mastery and relatedness. Although simply rewarding a new behaviour may reinforce it and help develop an expected habit (Skinner, 1953), performing tasks for intrinsic reasons puts someone in a healthier mental state than performing tasks for extrinsic rewards (Nicholson, 2014). Furthermore, in order to achieve a desired effect on learning when incorporating games and game elements in learning instruction, a comprehensive, instructional design is required (Huang, Ritzhaupt, Sommer, Zhu, Stephen, Valle, Hampton & Li, 2020;Landers, 2014;Fan, Xiao & Su, 2015).
When not adjusted to users' attributes, gamification in education can create needless cognitive load, namely elements that are not necessary in instructional design have a tendency to burden working memory capacity by increasing extraneous levels of cognitive load (Sweller, 1988;Sweller, van Merriënboer, Jeroen & Paas, 2019). For example, PBL have the potential to spark intrinsic motivation, but this may not always improve learning objectives (LOs), because these elements may trigger an unnecessary cognitive load. As a result, learners may be very cognitively engaged, but their attention may be drawn away from the learning task (Brom, Stárková, Bromová & Děchtěrenko, 2019).
Hamari, Koivisto, and Sarsa (2014) stress that there is a lack of clear understanding about what kind of outcome is expected when specific methods are applied in a specific context, for instance which specific factor triggered better performance, the learning approach, gamified pedagogy or a specific type of interaction (Mese & Dursun, 2019;Dichev & Dicheva, 2017). There is an evident need for more empirical studies with well-designed methodologies and robust comparison groups in order to confirm the effectiveness of gamified learning (Hamari et al., 2014). This systematic review (SR) explores the effects of gamified instruction(Note 1) (GI) on students' LO with the aim of identifying what constitutes success or the lack thereof in the given context.
In the context that allows monitoring students engagement and learning progress during the course of an intervention, the pre and post-intervention testing discrepancy does not seem sufficient to understand the causality behind the phenomenon. While it is somewhat 'clearer' what is the minimum standard when learning progress is measured and what is missing in the current practice, the multitude of theories that provide gamification design rationale, call for International Journal of Education ISSN 1948-5476 2021 the creation of minimum standards when reporting on the impact of GI on students' learning (Reeves & Reeves, 2015). With this in mind, the main research question for this study is: 'How did the authors measure success of GI?'. The secondary research questions are: (i) 'What is the impact of a GI on students' engagement? ', and, (iii) 'What is the impact of a GI on students' learning achievement?'. Both secondary research questions are discussed from the angle of the context in which the studies included in this SR took place.

Research Methodology
A SR of recent literature has been carried out in order to answer the research questions, with the focus lying on research situated within an educational setting. The review incorporates works indexed in the following digital libraries and databases: Education Resources Information Centre (ERIC), AIS eLibrary, ACM Digital Library, IEEE Xplore, Science Direct (Elsevier), Wiley Online Library, Springerlink, Ebsco, Emerald and the Directory of Open Access Journals (DOJ). In order to minimise the risk of biasing the results with data from unreliable sources, the non-peer-reviewed material was omitted and only peer reviewed literature was included in the search to ensure the quality of the results for the systematic review.

Research Selection
The literature review was carried out between October 2019 and December 2020. The final dataset is comprised of journal articles focusing on games, gamification, learning or education in their title and/or abstract that were published between January 2012 and December 2020. Study selection followed the protocol set out by Briffa, Jaftha, Loreto, Morone Pinto & Chircop (2020), by which the selection process includes three levels of search criteria and a rigid set of inclusion and exclusion criteria. At first, researchers worked independently, screening the search results according to a number of defined criteria. Afterwards, they collaborated, removing duplicates from the list of selected studies as well as refining the list. Methodology and results of each contested study were examined until researchers reached a consensus what should or should not go into the list.

Assessment Criteria and Extraction
To make sure that the studies on the list were chosen as objectively as possible, three levels of specificity using certain keywords were identified. These were 'gamification' (Level 1), 'gamification and education' (Level 2), and 'gamification, education and control' (Level 3). This made each level more specific and better targeted to meet the necessary criteria. The remaining articles were filtered through an extra set of inclusion and exclusion criteria that followed the same profile as the study carried out previously (Briffa et al., 2020), with the exception of the articles reporting on studies that were implemented in post-graduate contexts being disregarded.
The included articles had to comply with the study objective by employing game or game-elements within an education setting; they had to be empirical ones using specific study International Journal of Education ISSN 1948-5476 2021 designs, such as triangulation or a quantitative analytic approach; they had to have an age restriction not permitting participants to be less than 11 years old; they had to systematically measure students' achievement and/or engagement and student perceptions of GI through tests and/or surveys and questionnaires; they had to report on studies that had involved an 'experiment' and a 'control' group, and, lastly, the articles chosen had to be available in full-text, in English and peer reviewed to ensure that they are of a high quality. Meanwhile, the articles not to be included were review articles, meta-analyses and systematic reviews (though these were used for cross-validation purposes) and articles which did not use game-based learning tools.
Note-worthily, the decision regarding the bottom age of participants was based on Piaget's four stages of cognitive development. Namely, at about age 11, the concrete operational stage ought to finish and youngsters enter the abstract operational stage when they start developing their abstract thinking capabilities (Inhelder & Piaget, 2013). Therefore, it is both challenging and developmentally inappropriate to ask children below this age to reflect on their cognition (Fredricks, Blumenfeld & Paris, 2004;Fredricks & McColskey, 2012; Rodrí guez, Puig, Tellols & Samsó, 2020).

Data Collection
Application of the levels of search queries produced a total of 12,261 articles, published in the period 2012-2020. Duplicates were removed during the screening process. Whenever a full article was not initially available, authors were contacted via the academic social network 'ResearchGate' and a full version of the article was requested and another 14 articles were identified through alternative sources. A total of 12,275 were then screened thoroughly against the inclusion and exclusion criteria and a total of 12,152 articles were discarded due to not meeting the pre-determined requirements. One-hundred-twenty-tree full-text articles were assessed for eligibility and 51 further articles were dismissed because they reported on an experiment that: (10) involved a population below 11 years of age, (14) did not involve a control group/the control group involved was not compatible with researchers' expectations; (2) with vague methodology; (11) did not fit the purpose of the review; (3) involved students at a Post-Graduate level; (1) was also incorporated in another article; an article could not be accessed (1) as full text; (2) could not be accessed in English or an article (7) was not available. ISSN 1948-5476 2021  Note* three articles in this review, report on two studies, where the data gathering mechanism was altered between the studies. Therefore, the total number of studies included in this review is 75. The articles included in this systematic review are marked with an asterisk in the References section.

Data Analysis
The data were organised and tabulated following the guidelines of Webster & Watson (2002). According to these authors, to analyse studies for a review, a two-step process must be followed. The first step is an author-centric analysis in which selected studies are listed in a table and selected details from these papers are extracted and outlined in different columns. For this review, the details included: title of the article, journal and year of publication, educational level, subject, learning approach, game delivery platform, type of GI, game elements, sample size, gender representation within the sample, study duration and data collection tools. The second stage is a concept-centric method in which the author-centric analysis was pivoted and coded into concept-centric frequency tables (Webster & Watson, 2002).  ISSN 1948-5476 2021 Further analysis was guided by the following categories:

International Journal of Education
1. Intervention design 2. Engagement

Learning achievement
For category one, the following subcategories were considered: (i) data collection instruments, (ii) study duration, (iii) sample size, and (iv) gender representation within the sample. When it comes to categories two and three, the subcategories were: (i) subject, (ii) educational level, (iii) learning approach, (iv) type of GI, (v) game elements, and (vi) game delivery platform. The results were presented in a quantitative way (frequency) and outcomes were analysed based on their experiential and instrumental dimension (Liu et al., 2017).
Afterwards, the results were interpreted from a qualitative aspect, which allowed integrating the outcomes found in the categories of analysis, according to the corresponding phenomenon. The aim was to understand the phenomenon based on the qualities of the context, in order to clarify possible contradictions and generate conclusions about the effects of GI on learning outcomes.

Results
The search process resulted in 72 relevant articles which were included in the SR. The selected studies were journal articles (62, 84.9%) and conference papers (11, 15.1%).(Note 2) The majority of studies (43, 71.7%) explored the impact that GI has on both students' engagement and learning achievement. In addition, 16 (26.7%) studies were only focused on engagement, while, in 12 (20%) studies, learning achievement was solely the focus.
The studies included mainly involve the following structure for data gathering: first, the sample was split in one or (less often) more experiment groups and, most often, one control group. Most of the time, participants were randomly distributed into a control and an experiment group, unless the scope of the study also involved a particular learner characteristic in relation to the gamified instruction, such as learning style or gaming experience, for example. In that case, participants filled in a background questionnaire as part of the pre-test.
The experiment groups utilised gamified pedagogy and, most of the time, some kind of usage data was gathered; in case of two treatment groups they usually differed regarding the learning strategy (Sun-Lin & Chiou, 2017), learning approach (Lam, Hew & Chiu, 2018) or game-delivery platform (De-Marcos, Dominguez, Saenz-De-Navarrete & Pages et al. 2014; Sun & Hsieh, 2018). Furthermore, the control group(s) were mainly taught in a traditional manner, except in those studies that explored the impact of different designs (Wu, 2018;Muñoz-Carpio et al., 2019) or form of technology support (Naik & Kamat, 2016) on students' learning outcomes.

Intervention Design
Selected studies greatly differed regarding the extent of data gathering and analysis, sample size and length of intervention. Those studies that focused on the impact of GI on students' engagement predominantly utilised only one (23, 35.9%) or two (21, 32.8%) data-gathering tools within these sources: game scoring (number of collected points/badges or completed/bonus tasks) and embedded data collection mechanisms (number of contributions or material views/downloads) (36, 56.2%); curricular means and attendance/retention rates (12, 23.1%); qualitative data gathering techniques (27, 51.9%) and questionnaires created/adapted for the purpose of the study (57, 89.1%). The latter were self-report questionnaires aiming to explore the impact of GI on various constructs of motivation and engagement. Only in eight studies participants had the opportunity to fill in the questionnaires during the course of the intervention (12.5%).
In order to measure the impact of GI on learning achievement, the studies included in this review predominantly utilised one (38, 63.3%) or two (14, 23.33%) data-gathering tools within the following sources, tests specifically designed/adapted for the purpose of the study (40, 67%) or curricular means (31, 52%). In addition, game scoring (5, 7%) and qualitative data-gathering techniques (3, 5.9%) were found in a small share of studies. Moreover, 16 (13.37%) studies included in this review did not utilise, at any time during the intervention, tests that were adapted/created for the purpose of the study. Also, only two (4%) studies explored knowledge retention rates with one of them not involving a post-test.
The following characteristics of the study design were also included in the analysis: sample size, intervention duration and gender representation within the sample.
Studies that include a small number of participants tend to generate findings that are context-dependent (Slavin & Smith, 2009). Also, the presence of an experimental group requires a substantial sample size (Petri & von Wangenheim, 2016). For this reason, studies were categorised using the following sample size indicators by Slavin and Smith (2009): (N<100) 'small sample size', (100≤N>250) 'medium sample size', and (N>250) 'large sample size' studies. In addition, in 29 (38.2%) studies, gender representation within the sample was not reported. ISSN 1948-5476 2021

Figure 2. Sample Size in the Studies Included in the Review
Interventions of a short duration have a tendency to generate findings that are tempered by the gamification novelty effect. Namely, changes in users' behaviour may result from their curiosity about gamification and interest to try it out. Later on, as novelty wears off, these changed behaviour levels may decline (Chen, Shih & Law, 2020;De-Marcos et al., 2014;Fotaris, Mastoras, Leinfellner & Rosunally, 2016;Kermek, Novak, & Kaniski, 2018;Liu, Santhanam & Webster, 2017;Pechenkina et al., 2017;Tsay, Kofinas & Luo, 2018;Tsay, Kofinas, Smita & Yang, 2020). As only 22 (29.3%) studies included in this review report the length of intervention in hours (whether cumulative or distributed), studies that reported on interventions that lasted only one session were labelled as 'very short' and interventions that lasted for one week were labelled as 'short'. Meanwhile, interventions that spanned over one semester were labelled as 'medium' and those that took longer than one semester as 'long'.

Figure 3. Intervention Duration in the Studies Included in the Review
A comprehensive description of the methodology and study design is a condition sine qua non for a critical evaluation of results obtained through experimental studies via means of replication studies (Reeves & Reeves, 2015). International Journal of Education ISSN 1948-5476 2021

Context of Instruction
In order to systematically describe the context in which these studies took place, the authors considered the participants' level of education, the subject being taught, the learning approach, the type of GI, the game elements and the game delivery platform. For a full list of articles including the type of GI, game elements used and the study outcome, see ANNEX B, Summary, Table 4. When it comes to the subject taught, the greatest share of included studies involved IT (22, 22.29%) and specific skills (9, 12.2%) such as library instruction or designing a questionnaire, followed by science, languages and mathematics (8, 0.7% each), business and mixed subjects (both 4, 5.3%). For more details, see Figure 6. International Journal of Education ISSN 1948-5476 2021

Figure 6. Subjects Gamified in the Studies Included in the Review
The majority of studies included in the review involved classroom-based instruction, whereas the role of the teacher was not limited only to the central figure in knowledge transmission. Unless the authors specified the approach or the description of the activities did not imply otherwise, these studies were labelled as the traditional learning approach (22, 29.3%), followed by a flipped classroom, (13, 17.3%) and blended learning approach (10, 13.4%). For more details, see Figure 7. Among the studies included in the review, the most usually employed game delivery platform was the virtual learning environment (VLE) (22, 29.3%), followed by web-based content (14, 18.7%). The latter includes online learning platforms like TRAKLA2, MinecratfEdu, etc.,  ISSN 1948-5476 2021 and cases where game element was delivered in a web-based format (e.g. website served as a LB). Furthermore, a number of studies (12, 16%) utilised a platform configured of two or more previously mentioned platforms, such as IRS/game mechanics, VLE/DGBL or DGBL/IRS/VLE that were all used in two (2.7%) studies each. For more details, see Figure 8. In addition, in 37 (49.3%) studies, platforms were developed specifically for the purpose of that specific research while seven (9.7%) studies were utilising a system configured of components that were at least partly already available on the market.

Figure 8. Game Delivery Platform in the Studies Included in the Review
When it comes to the description of game elements, the level of detail varied greatly among the studies. For this reason, the data on game elements was first listed as specified in the articles. Afterwards, researchers replaced synonyms so that uniform language is used and frequency of usage could be calculated for each game element. Moreover, six (8%) studies did not include a description on game elements. For more details, see Figure 9. International Journal of Education ISSN 1948-5476 2021 Based on their outcomes, studies were grouped into categories (listed below) that display varying positive effects on GI (i-iii) and possible failure to improve GI (iv): I. the majority of measured aspects of: a) engagement (37, 57.8%), and/or b) academic achievement (31, 51.7%) II. only one aspect of a) engagement (15, 23.4%); and/or b) academic achievement (9, 15%) that was measured III. on a) engagement (6, 9.4%), and/or b) academic achievement (5, 8.3%) of only a specific subpopulation (e.g. male/female participants, high/low achievers, etc.); and IV. studies that failed to identify a significant positive effect of GI on any aspect of a) engagement (6, 9.4%) and/or b) academic achievement (15, 25%) that was measured.

Intervention Design and Reported Learning Outcomes
If we take into consideration the practice of diverse, and often unstated, rationale for different gamification designs (Seaborn & Fels, 2015;Dichev & Dicheva, 2017) it is understandable that the designs, methods, and variables that were explored in the studies included in this review vary substantially (Manzano-León et al., 2021). This subjective and unsystematic practice leaves some theories, relevant factors and forms of engagement in gamified learning environment underexplored and without a theory-derived framework to scaffold gamification design (Rapp et al., 2019;Dichev & Dicheva, 2017;Nacke & Deterding, 2017;Seaborn & Fels, 2015;. In 2011, Deterding, Khaled, et al. proposed a working definition for the group of phenomena that gamification represents, including similar concepts like serious games, serious gaming, etc. as "the use of game design elements in non-game contexts." (p. 1). In order to precisely integrate a variety of research endeavours, the authors suggest involving the description of the following levels of game-design process: interface design patternsgame components used to create interfaces for the specific context that users face; game design patterns/game mechanics -reoccurring blocks of game design that dictate the game play; design principles or heuristics: rules for approaching a design problem/evaluating a design solution.; and conceptual models of game components/game experience, such as the MDA .
Aside from the sporadic and scarce description of game components, among the studies included in the review, information that was the most often lacking was the gender representation within the sample (Torres-Toukoumidis et al., 2021) as 38.16% of reviewed studies did not include this information and the length of intervention expressed in hours, that was included only in 29.3% of the studies. Furthermore, in a number of studies, the authors did not specify the learning approach used to deliver GI. Still, based on the description of learning activities, these studies were labelled as taking place in a traditional classroom but, in 10.7% of studies, there was no notion of the learning approach used and, in 4% of the studies, the International Journal of Education ISSN 1948-5476 2021 delivery platform was not specified. This leaves an impression that the importance of a comprehensive instructional design is not being stressed enough and that the mentality of putting technology first is still present in academia. A theoretical model that links the learning approach used to deliver the GI with the outcomes of the intervention is a precondition in order to detangle why these techniques affect outcomes the way they do. The present gap not only restricts the generalisability of research findings but also, provides unreliable guidance to gamification practitioners (Landers, 2014). The motivational effect triggered by a specific mechanism is not guaranteed in a different educational context (Dichev & Dicheva, 2017).
More than half of the studies that explored the effect of GI on learning achievement (63.3%) based their conclusions on data collected via only one data-collection source, predominantly pre and post-tests developed or adapted for the purpose of the study (26%). In addition, 5.2% of these studies did not gather any data on students' engagement. Although the difference between pre-and post-testing is a sound proof of the progress made, grades and test scores alone do not say much about the quality of the effort exerted in order to obtain them. Likewise, data obtained via embedded data-collection mechanisms, game scoring, curricular means and students' attendance rates alone do not record changes in motivation (7.8% of included studies explore the effect on students' engagement only via these data collection sources).
The students' voice needs to be heard as, otherwise, the human side of the gamified experience will go unnoticed. Also, when aiming to aptly categorise and interpret the effects of GI, attitudes and behaviours that are the direct outcome of GI should be explicitly measured (Landers, 2014). Among included studies with learning achievement in focus, only 5% involved qualitative data-collection techniques. . Specifically, involving more interview data in addition to the data collected via quantitative means (Aşıksoy, 2018;Hew et al., 2016;Huang et al., 2019;Ortiz-Rojas, Chiluiza, & Valcke, 2019;Wichadee & Pattanapichet, 2018) and triangulating the data collected via self-report questionnaires or qualitative techniques with the observational data would be a good idea (Segura-Robles et al., 2020;Alt & Raichel, 2020).
Continuous theoretical and rigorous systematic empirical work in varying gamification settings and contexts is a prerequisite for developing a comprehensive practical and methodological understanding of the benefits of GI (Dichev & Dicheva, 2017). Still, Rapp et al. (2019) warn that, so far, very little empirical work has focused on exploring the influence of contextual factors and individual diversities on the effectiveness of gamified systems. Therefore, aside from a mixed-method approach, a detailed description of the game-design process, future studies should involve a detailed description of the context in which they took place. In addition, those studies that are utilising a game already available on the market (instead of developing one), ought to list the game components.
forward when designing and implementing GI within specific conditions. Therefore, the description must depict the subject/topic of instruction and the educational level; the learning approach and the platform used to deliver GI; and the characteristics of the sample, such as gender representation and participants' average age, and intervention duration expressed in hours instead of simply stating that the intervention spanned over a semester. In addition, considering the small number of interventions that took longer than one semester (22.7%); included subsequent (5%) or delayed tests (5%) and involved a sample of more than 250 participants (14.7%). In order to understand how students approach GI and mis/use it, future studies should be longitudinal (B. Huang et al., 2018;Van Roy, Deterding & Zaman, 2018) or at least involve a longer duration, a larger number of participants and a more intensive progress tracking schedule.

Learning Context and Learning Outcomes
In general, in gamification research, the majority of studies so far have involved college students (Dichev & Dicheva, 2017) and GI of subjects under the IT umbrella, followed by engineering and mathematics (Swacha, 2021). Accordingly, the majority of studies included in the SR took place at undergraduate level (53), involved IT (22) and used VLE (22) for game delivery. Therefore, it does not come as a surprise that, from the aspect of individual subjects, all studies that reported a significant effect of GI on students' achievement and involved IT took place at undergraduate level (7). The majority of these studies utilised VLE (4) for delivery of GI and were framed in a blended framework (3).
The other two subjects where GI, most often, had significant impact on students' learning achievement were Mathematics (6) and Science (6). The majority of these studies took place at the primary (4) or secondary (4) level and employed DGBL (6) in a traditional (2) classroom. Note-worthily, gender differences seem to affect LOs when employing a DGBL approach (Dorji et al., 2015;Khan et al., 2017). Moreover, competition in DGBL was found to be effective when learning mathematics, science and languages, whereas the impact of competition on LOs varied due to game genre, subject, or grade level (Chen, Shih & Law, 2020). In general, men experience greater success with gamification when it involves competition while, for women, the social aspect of gamification is crucial (Hamari et al., 2014). However, based on the available description of game components, regardless of the type of GI, it seems that studies that involved these three subjects and found significant effect of GI on student achievement predominantly combined competition and collaboration mechanics (Barlow & Fleming, 2016;Gressick & Langston, Aşıksoy, 2018;Borsos, 2019).
In order to ensure the optimal effect of collaborative learning, a) the knowledge of group members should not be of the same level, instead, knowledge gaps should be present among members; b) the type of tasks should be of a problem-solving nature, and c) the learning strategy needs to be in line with the learners' level of expertise (Retnowati et al., 2018). Correspondingly, students reported that collaboration allowed them to build on each other's knowledge, develop new attitudes, cognitive and psychomotor skills (Dziob, 2020) but, also, gain confidence in their existing capabilities and stimulated them to apply the knowledge and skills learned in their place of work (Alt & Raichel, 2020). In order to avoid collaboration International Journal of Education ISSN 1948-5476 2021 creating an extraneous cognitive load that can reduce learning, learners need to perceive collaboration as necessary to achieve their learning goal (Chandler & Sweller, 1991).
A number of studies did not find a detectable effect of GI on students' achievement (15, 25%) and/or engagement (6, 9.4%). Instead, some negative effects on participants' engagement relative to the control condition were identified, such as a decrease in motivation, satisfaction and empowerment over time (Hanus & Fox, 2015). Some participants felt higher pressure, tension (Hong & Masood, 2014) and anxiety (Aşıksoy, 2018). The types of interactions that students have among themselves have a long-lasting effect on their LOs (Huang et al., 2020), therefore, the authors link this negative experience with the collaborative and competitive nature of GI (Hong & Masood, 2014). Authors warn that gamification promotes individual work and competition over collaboration (De-Marcos et al., 2014) and, also, that competition might take priority over the urge to gain knowledge (Mese & Dursun, 2019;De-Marcos et al., 2014). For this reason, Armier et al. (2016) recommend considering students' tendencies towards collaboration and competition prior to the intervention.
Over a half of studies (53%) that found no significant effect of GI on students' learning achievement reported a significant positive effect on students' engagement. Though participants reported, predominantly, positive effects on their emotional engagement (Khan et al., 2017;Leaning, 2015;Rojas-López et al., 2019;Kaneko et al., 2016;Sancho-Vinuesa et al., 2018) and behavioural engagement (Khan et al., 2017;Laskowski, 2015), they also acknowledged the approach as relevant to grasp the content (Kaneko et al., 2016;Leaning, 2015;Stansbury & Earnest, 2017) and beneficial in advancing their reasoning abilities (Stansbury & Earnest, 2017). For this reason, Rojas-López et al. (2019) highlight, that cognitive assessment should not reside only on traditional assessment instruments (Dichev & Dicheva, 2017). The majority of the above-mentioned studies, with conflicting findings, were framed in the traditional (Khan et al., 2017;Leaning, 2015;Stansbury & Earnest, 2017) or e-learning context (Kaneko et al., 2016;Sancho-Vinuesa et al., 2018) and employed competition and progression components. The conflicting findings could be a result of the ill-fitted progress-tracking mechanism or, maybe, the different game components, such as collaboration and/or narrative, which might contribute more to students' learning in the given context. Also, it seems that, in a traditional classroom, DGBL has the greatest potential to improve students' engagement and achievement (García & Cano, 2018;Hannig et al., 2013;Sun-Lin & Chiou, 2017). Dichev and Dicheva (2017) presume that overemployment of PBL originates from their similarity to traditional classroom assessment.
In a distant learning environment, interactivity is crucial for building and maintaining personal relationships and a sense of belonging in the community (Delahunty, Verenikina, & Jones, 2014). Game elements, such as points, sharing virtual goods/gifts, team LB, social graphs and an option to comment on the activity of one's peer learners, have potential to International Journal of Education ISSN 1948-5476 2021 promote social interaction and a sense of community in this context . De-Marcos et al. (2014) found that, in an e-learning context, social networking had greater impact on students' participation rates and knowledge acquisition than the GI, whereas Huang et al. (2020) found positive effects of GI on students' online interaction. In order to make students more relaxed to express their views, some authors propose allowing them to stay anonymous (Lam et al., 2018;Rojas-López et al., 2019) also, anonymity can prevent students from using other channels of communication than those offered as a part of the system (Berns et al., 2016). Gamification is particularly suitable for learning approaches in which students, through experiential learning and active interaction among themselves, construct a LO (Cheong et al., 2014;Caton & Greenhill, 2014;Tsay et al., 2018).
In a number of studies that utilised a competitive game-based approach in a hybrid setting, participants reported that their motivation and engagement had improved (Aşıksoy, 2018;Rojas-López et al., 2019;Wang, 2017;Zainuddin, 2018, Tsay et al., 2020 along with, their reasoning skills (Zainuddin, 2018), IT competencies and learning capacity (Zainuddin, 2018;Huesca Juárez & Medina Herrera, 2019). Furthermore, participants reported that the approach promoted exchange of ideas, discussion, social learning (Aşıksoy, 2018;Wang, 2017) and completion of challenges (Rojas-López et al., 2019). The majority of these studies utilised VLE (Aşıksoy, 2018;Rojas-López et al., 2019;Zainuddin, 2018, Tsay et al., 2020. In a hybrid setting where GI is delivered via VLE, game narrative (supported by game elements) proved to be a useful means to connect learning activities that are offered on the system (Tsay et al., 2020). In order to create the narrative, Tsay et al. (2020) had regular communication with students, carefully selected useful tasks and set clear rules. So, based on the communication with students, the authors included design choices that match a greater range of learners than in the first iteration and promoted only those activities that were the most popular, while the least popular activities were removed from the second iteration. This way, the narrative ensured the "meaningfulness" that students needed to overcome the novelty effect (Tsay et al., 2020).
Although teacher-learner communication via IRS remains unidirectional (Sun & Hsieh, 2018), continuous monitoring delivered by this tool enables students to self-regulate their progress, and promotes autonomy and control of their own learning (González, 2018;Zainuddin, Shujahat, Haruna & Chu, 2020). The main challenge of inquiry-based instruction is that International Journal of Education ISSN 1948-5476 2021 cognitive gain is only as strong as the questions asked (Morillas Barrio et al., 2016). When it comes to specific sub-populations, the use of IRS seemed to have the greatest effect on learning achievement of medium (Wang, 2017) and low achievers (Wu, 2018;González, 2018), young (Wang, 2017;González, 2018) and novice learners (Wang, 2017), whereas male students favoured the use of IRS more than female (Khan et al., 2017).

Conclusion
Before concluding the findings of this review, it is important to stress that the research included only listed libraries and databases. Therefore, it is likely that relevant articles that were published in the journals not included in these sources were omitted. For this reason, future reviews should involve a wider range of both libraries and sources. In addition, due to the main focus of studies included in the review, this study was not able to answer the question 'What are the limitations of GI?'. Hence, aside from the recommendations listed below, future studies should focus more on the link between specific game element/game mechanics and unintended consequences of design that may lead to adverse effects.
If the aim is to understand when and why gamified learning fails, it seems artificial and incomplete to explore learning progress in isolation from the other dimensions of experience that accompany it. Among the studies included in this review, primary deficiencies involve sporadic description of game design process/game components, scarce descriptions of study contexts, a lack of mixed-methods approaches, small numbers of participants and short intervention durations. Evidently, gamification is moving away from the typical use of PBLs and, therefore, relies more on established game-design principals (Manzano-León et al., 2021;Rapp et al., 2019). Still, for some reason, it seems that gamification research neglects the full range of game design expertise when designing a system (Rapp et al., 2019).
Defining essential variables and the creation of minimum standards in educational research communities is vital because it allows the systemising of findings resulting from diverse research attempts but, also, the filtering of a variety of findings in the field via means of replication studies (Reeves & Reeves, 2015). For this reason, future studies should include a description of the subject/topic; learning approach and delivery platform; characteristics of the sample and intervention duration expressed in hours. Moreover, studies that developed their own 'games' should include the description of the game design process or, if utilising a 'game' already available on the market, a detailed list of game components should be provided. In order to obtain more in-depth findings, it is highly recommended to use a mixed-methods approach, with a more intense tracking schedule, including modern assessment instruments and a longitudinal study design.
The effect that GI has on students' LOs largely depends on linking the appropriate learning approach (one that facilitates development of skills needed to competently apply the content of instruction and complete the 'game') with optimal delivery platform. That is, a platform, or a combination of platforms, that support game-like features and types of interaction that prove to stimulate motivation and learning engagement of a specific student population in a International Journal of Education ISSN 1948-5476 2021 given context. For example, in hybrid settings, the greatest success was linked with the use of VLE, but, when it comes to question-driven instructions or project/problem-based learning, combined platforms or IRS were utilised. On the other hand, unlike the hybrid pedagogy where learning content, regardless of learners' age, can be tailored for asynchronous learning, active learning approaches require a developed capacity for abstract thinking/level of cognitive maturity.
Clearly, the magnitude of the impact is not in the learning technology but in the type of interaction (Beauchamp & Kennewell, 2010). Classroom interaction typically ranges from 'authoritative' to 'dialogic' and shifting towards the dialogic end of the scale should result in improved learning processes and outcomes (Beauchamp & Kennewell, 2010). While a specific learning approach dictates the dynamic of learner-teacher interaction, the components that are included in the platform dictate how learners will interact among themselves and progress through the game (Armier et al., 2016;Tsay et al., 2020).
In general, when teachers first adopt learning technologies as a part of their practice, there is a trend that interactivity is superficial and more on the authoritative side (Beauchamp & Kennewell, 2010). For this reason, future educators should be trained both in a variety of innovative methods and instructional pedagogies including GI (Camilleri & Camilleri, 2019;Gressick & Langston, 2017) and the latest IT trends (Ahmadi, 2018;. In addition, gaining experience in implementing GBL without the use of technology seems a good way to feel the 'classroom dynamics' when specific game mechanics are introduced . When it comes to 'game' rules and guidelines, it is a must to keep them clear and simple (Armier et al., 2016;Kermek et al., 2018;Tsay et al., 2020).
Both user-centred design and student-centred learning place user/learner needs in the centre of an effective system (Hesson & Shad, 2007). Well-designed interactive programmes allow students to see and explore concepts from different perspectives by using various representations of ideas, real-time feedback and opportunities to apply learning through content creation (Darling-Hammond, Zielezinski & Shelly, 2014). In a gamified classroom, students feel in control of their learning because they can learn at their own pace, while the continuous feedback provides them with hints and opportunities for contemplation when facing a problem while practicing the curriculum (Rueckert et al., 2020).  ISSN 1948-5476 2021 *Chung-Shing, C., Yat-Hung, C., & Heung Agnes, F. T. (2020). The effectiveness of online scenario game for ecotourism education from knowledge-attitude-usability dimensions.