Adaptive Comparative Judgment and Psychological Safety in Accounting Education

Traditional Accounting curricula include instruction on the preparation of audit documentation, however, experiential instruction for students on the process of review-both as a reviewer and reviewee-is often scarce or missing. This study investigated a classroom intervention that engaged undergraduate students in peer-review activities to gauge how peer review and feedback impacted student performance and their perceptions of being able to engage in interpersonal risks. Using a case-method approach, students developed audit workpapers that were later peer-reviewed through a digital system utilizing adaptive comparative judgment (ACJ). Students’ achievement scores were collected, and students also completed a pre/post survey on psychological safety. Our results indicate that student psychological safety increased over the course of the semester; however, the peer review process through ACJ did not significantly improve student performance within the class. The students responded positively to the intervention as an engaging learning process and effective in teaching real-world skills. Thus, this intervention provides an example of how peer review activities could enhance the learning experience for students.


The Accounting Review Process
Auditing Standards require that auditors maintain sufficient documentation to support their conclusions. "Among other things, audit documentation includes records of the planning and performance of the work, the procedures performed, evidence obtained, and conclusions reached by the auditor" (PCAOB 2016, AS 1251. Standards also require that all audit documentation be reviewed by at least one other audit team member (PCAOB 2016, AS 1201. The initial creation of audit documentation (Payne and Ramsay, 2008;Shankar and Tan, 2006;Andiola et al., 2018), as well as the review process (Fargher et al., 2005;Lambert and Agoglia, 2011;Brazel et al 2004;Frank and Hoffman, 2015;Agoglia et al 2010;Hun-Tong Tan and Trotman, 2003;Rosman et al., 2007;Payne et al., 2010;Harding, 2010;Bamber and Ramsay, 2000), are critical in meeting audit standards and completing a quality audit. In the case of failed audits, the finalized audit documentation is the key evidence in auditor negligence trials (Backof, 2015). This audit documentation is prepared by all members of the audit team, and some team members with limited experience will be given significant reviewing responsibilities. In fact, after only 5 years of experience, auditors may spend as much as 50% of their time reviewing audit workpapers prepared by other audit team members (Bamber and Bylinski, 1987;Asare and McDaniel, 1996; see also Sweeney, Suh, Dalton, & Meljem, 2017). Therefore, recruiting incoming auditors with review experience could not only be a competitive advantage, but insurance against forms of auditor negligence.
Recruiters report seeking students who have both technical and behavioral knowledge and skills (Rynes et al., 2003). While technical skills are obviously needed to complete and review audits, the behavior knowledge and skills are also crucial to ensure healthy team and intergroup dynamics (Plant, Barac, & Sarens, 2019;Piper, 2017;Chambers & McDonald, 2013). Not only must auditors be able to work closely with clients to perform and document audit procedures, but they must be able to give and receive direct and candid feedback about audit work and documentation that has been completed. In some cases, auditors will go through peer review performance evaluations several times a week, making the ability to foster psychological safety-a team environment where it is safe to take interpersonal risks, such as speaking up on issues or conflicts (Edmondson, 1999)-a critical skill for a successful team working environments (Rozovsky, 2015). The need for educating auditing students in the technical skills of review with additional practice in fostering psychological safety in teams is key for a healthy workforce. Andilla et al. (2018) noted that accounting curricula include instruction on documentation preparation, but the instruction on the review process-both as a reviewer and reviewee-is often sparse or absent. Consistent with this, a recent study (Ulrich and Blouch, 2018) asked accounting professionals responsible for evaluating new hires to rank 63 auditing curriculum potential pitfalls as peer feedback resulted in a deflation of personal perceptions following highlighted deficiencies (Kim & Kim, 2020; see also DeNisi, Randolph, & Blencoe, 1983). Further, peer feedback scenarios have also been linked to struggles around the credibility of feedback due to potential discrepancies Hamer, Purchase, Luxton-Reilly, & Denny, 2015;McCarthy, 2017;Zhang, 2019). Additional struggles related to the time, effort, and planning required for incorporating the feedback received in learning settings often inhibit widespread adoption (Black & Wiliam, 2018).
Importantly, this act of engaging in peer review and feedback can become more than simply an evaluation and scoring process -this act can become a learning experience for the reviewer which has the potential to positively improve students' abilities by shaping their thinking during the act of reviewing (Bartholomew, Mentzer, et al., 2019;Czaja, & Cummings, 2010). As students engage formatively in peer review and feedback to/from their classmates and compare peer work, significant improvement has been found over other students without the same opportunities (Bartholomew, Mentzer, et al., 2019;Bartholomew, Strimel, & Yoshikawa, 2018;Seery & Canty, 2017). Studies have shown that as students participate in peer review and feedback processes, they: 1. perform better than students without this formative practice (Li & Gao, 2016;Li, Liu, & Steckelberg, 2010), 2. experience both improved feedback quality and better assignment scores (Bartholomew, Strimel, & Yoshikawa, 2019;Gielen & de Wever, 2015), 3. improve their critical thinking ability (Sluijsmans, Dochy, & Moerkerke, 1998), 4. develop a better understanding of class material (Stefani, 1994), 5. engage in a more analytical approach towards assignment criteria (Nicol, Thomson, & Breslin, 2014), and 6. demonstrate increased engagement (Jurado, 2011).

Adaptive Comparative Judgment
Adaptive comparative judgment (ACJ) is an approach to evaluation which utilizes iterative comparisons between pairs of items (Pollitt, 2012). ACJ, although originally designed as an assessment tool, has increasingly been utilized as a mechanism for engaging students in review, critique, evaluation, and learning (Bartholomew, Mentzer, et al., 2019;Bartholomew, Strimel, & Yoshikawa, 2019;Seery & Canty, 2017). ACJ began from work by Thurstone (1927) and later Pollitt (2004); both posited that using comparative judgments to evaluate quality through holistic approaches, rather than rubric-centric methodologies, was a way for improving assessment techniques and reliability. Comparative judgment (CJ) involves an assessor-student, teacher, or professional-working to discern qualitative differences between items as opposed to subjectively evaluating them and assigning point values. In these settings the assessor does not tally points, rather they make holistic comparisons and choose which item, of those displayed, is better (Kimbell, 2007(Kimbell, , 2012aPollitt, 2004). These comparisons are repeated until a rank order of all the items is produced (Bartholomew, 2017). The addition of an algorithm which intentionally-pairs items based on previous judgments led to the creation of adaptive comparative judgment (ACJ); the intentional pairing is done to increase both efficiency and reliability of the results.
Concerns related to the validity of this algorithm have been raised (Bramley, 2015), debated (Pollitt, 2015), and addressed (Rangel-Smith & Lynch, 2018), with an ever-growing body of research demonstrating positive effects following ACJ implementation in classrooms across grade levels Bartholomew & Jones, 2021). Originally designed as an assessment tool focused on summative settings, the positive effects of utilizing ACJ as a formative peer review tool have been documented in recent work (Bartholomew, Mentzer, et al., 2019;Bartholomew, Strimel, & Yoshikawa, 2019). Both studies by Seery, Buckley, Delahunty, and Canty (2019) and Canty (2012) used ACJ in higher education and found that students react well to using ACJ for formative peer review as it focused on assessing the student work as a whole and facilitated improvement across the entire student body in a class. Bartholomew, Strimel, and Yoshikawa (2019) reported that students engaged in peer review through ACJ demonstrated learning gains as a result of four key activities: 1) their exposure to peer ideas, 2) the opportunity to engage in peer review through comparative judgments, 3) the process of providing peer formative feedback, and 4) the opportunity to receive peer formative feedback from their classmates who reviewed their projects.
The entirety of the research into CJ and ACJ, the outcomes, and the various implementations, is beyond the scope of this work; however, syntheses of work related to ACJ in educational settings (Bartholomew & Jones, 2021; reported indications of promise for this approach to the assessment of open-ended problems and for student learning through formative peer review. Further, several researchers have included student opinions relative to embedding ACJ for peer review. Bartholomew, Zhang, Garcia-Bravo, and Strimel (2019)  Additionally, Kimbell (2018), pointed out that as students review and evaluate work they also learn as they are required to not only recognize but also articulate what makes something good, better, or best. There is increasing evidence that peer review experiences-specifically those facilitated through ACJ-may lead to increases in student learning (Bartholomew & Jones, 2021). However, this approach has not been used equally in all disciplines (e.g., the majority of ACJ peer review research centers on Design and Technology education) and an understanding of student experiences, particularly in how comparative judgments impact psychological safety, during ACJ is not entirely clear.

Psychological Safety
In addition to the potential for students to demonstrate learning gains through peer review and feedback there is also anecdotal evidence that students engaging in this process may feel more comfortable with the overall review experience. For example, as students evaluate peer work, they may also inherently begin to recognize both the strengths and weaknesses of their peers' work and also those contained within their own work (Czaja & Cummings, 2010). Repeated peer review experiences may thus lead to an overall comfort level with both reviewing and being reviewed by others. We conjecture that this may be a subtle, but important, element of added benefit within peer review and feedback experiences.
Importantly, there may often be a notion that peer review and feedback processes, if not anonymous, leads to tension or other conflicts in teams, groups, or classrooms as individuals exhibit frustration over the critiques from their peers. Though adults, adolescents, and children can respond negatively to destructive or even critical feedback (Baron, 1988 Bartholomew, Strimel, & Yoshikawa, 2019). Research in the field of psychological safety-defined as a shared belief, amongst individuals, that an individual or team is safe to engage in interpersonal risks (Edmondson, 1999;Newman, Donohue, & Eva, 2017)-suggests that the group dynamic may be the antecedent to how one reacts to the interpersonal risk of giving and receiving feedback.
Interpersonal risks can include speaking up on a number of topics or issues, showing or sharing interest in one another, engaging in constructive conflict, and experiencing failure in experiments or risks (Edmondson, 1999(Edmondson, , 2004. When a team or organization fosters psychological safety, they create the group setting to be a place where members can speak up, critique, fail, and debate without degradation or shame. Rozovsky (2015) discovered that of the five main components of successful teams (psychological safety, dependability, structure and clarity, meaning of work, and impact of work), psychological safety "was far and away the most important of the five dynamics we found-it's the underpinning of the other four" (par. 11). Psychological safety has more to do with intergroup dynamics than simply feeling comfortable.
Educational settings offer opportunities for students to share new, and at times, tentative ideas. In these settings students attempt new procedures and strategies and navigate unknown ideas and conceptsthis can sometimes lead to failure of one kind or another (Bransford & Donovan, 2005;Byrnes, 1998). Such behaviors can be seen as risky, especially among peers. Such risks in one's educational experience, have been defined as intellectual risk taking (IRT) (Byrnes, 1998;Byrnes, Miller, & Schafer, 1999;Clifford, 1991;Beghetto, 2009). As education does not take place in a vacuum but is, more often than not, in the presence of peers, a certain level of psychological safety is needed to facilitate such important risk-taking behavior in learning. Garvin, Edmondson, and Gino (2008) described the building blocks of creating a learning organization, and in support of Rozovsky's (2015) findings, the first building block is psychological safety. One cannot learn if one is not in an environment that promotes a certain level of acceptance of others' ideas and contributions. Cajiao and Burke (2016) studied instructional methods focused on social interaction with 246 business students in Colombia. The instructional method significantly-directly and indirectly-increased learning behaviors among students when mediated by psychological safety. Student social interaction, among students and instructors, is impacted by the level of psychological safety that is cultivated in the classroom or learning environment. Though, as Howorth, Smith, and Parkinson (2012) found, while engendering psychological safety in short amounts of time may be less effective than over longer periods (i.e., 10 months), in practice it is not impossible and has been achieved in various settings (Cajiao & Burke, 2015;Morrison et al., 2019;Oakley, Felder, Brent, & Elhajj, 2004). Cultivating psychological safety in the classroom may be as imperative as it is in successful work teams.
To teach skills of working in teams, instructors often use self-and peer-evaluations-something especially common in business education settings (Ohland, et al., 2012). We suggest that this may not only be a pedagogical technique, but an opportunity to practice the real-world skill of peer review and feedback. Based on the role psychological safety plays in successful work teams and in learning environments, it follows that this should similarly be part of the educational experience of peer review and feedback. De Stobbelier and Ashford's research (2014) supports this idea; in a sample of 224 employee-supervisor dyads, they found that interdependent task relationships moderated by psychological safety increased the tendency for employees to seek feedback from their peers. Psychologically safe environments can turn previously competing entities, like peers at work, into sources of learning. Such environments may not eliminate conflict but may make conflict more productive. Given the impact of psychological safety in motivating individuals to seek out feedback, we contend that further research is needed as to the influence the strategies by which leaders or instructors implement peer review and feedback processes into the teams they oversee have on the construct. ACJ is one such strategy for peer review and feedback, and though it has only once been used in business education to assist in peer evaluation (Metzgar, 2016), how ACJ influences psychological safety within teams is greatly under-researched. Therefore, a particular interest for our study was the interaction and impact of peer review using ACJ on one's ability to take interpersonal risks.

Research Questions and Hypotheses
The guiding research question for our study is "Does a student's use of ACJ for peer review impact their performance on accounting case study projects and/or their psychological safety during group work?" Based on the research discussed above, we expect that as students use ACJ to engage in peer review at the conclusion of each project in an accounting course, 1) their technical abilities will improve, 2) they will learn to recognize "good" solutions, and 3) they will solidify their own understanding of important accounting principles. Therefore, we form the following hypotheses: H1: Students who use ACJ to evaluate peers' projects will show greater performance improvements on future projects than student's who do not evaluate peer's projects.
Further, research suggests that the inherent collaborative nature of ACJ may impact student psychological safety. This psychological safety is needed in everyday teamwork but becomes vital in the case of interpersonal risks such as speaking up and in giving and receiving feedback. ACJ provides an opportunity to be exposed to peer feedback in an anonymous virtual environment with binary (e.g., "which one is better?") judgments that results in a rank order of student work (best-to-worst). It is our hypothesis that this method of peer review and feedback in the class will impact student psychological safety, specifically in that as students use ACJ, they will improve in psychological safety and willingness to both provide and receive feedback. We link this hypothesis to four key traits of ACJ (Bartholomew, Strimel, & Yoshikawa, 2019): 1) Exposure to their peers' ideas, 2) Providing feedback to peers, 3) Receiving feedback from peers, and 4) Deciphering between gradations of quality in peer/self-solution design. Taken together, these ideas lead to our second hypothesis: H2: Students who use ACJ to evaluate peers' projects will demonstrate increased psychological safety.

Method
This research took place at an AACSB accredited University in the Western U.S.A. with students enrolled in a senior level introductory Auditing course. Of the 32 total class members, 17 were female (53%) and 15 were male (47%). All students enrolled in this course were randomly paired in teams of two students each. Each team completed six auditing case studies designed to represent work they will typically encounter during their first year in the auditing profession.
Five of the case studies were completed using Microsoft Excel, and one case was completed using Microsoft Word (see Table 1 below). Prior to completing work, the student teams were randomly organized into two treatment groups, Group A and Group B, for the intervention designed to evaluate the cases. Group A was engaged in using ACJ to compare, rank and Journal of Education and Training ISSN 2330-9709 2022 provide feedback (the intervention) for cases 1-3, while Group B was the control group for the first three cases. Group B was then engaged in the intervention to evaluate cases 4-6, while Group A was not.
Before the first case, the professor held discussions with the class to ensure understanding of why the intervention and practice of peer review and feedback was important in the class, what skills it was designed to help them develop based on his experience in the profession and feedback from current professional employers, and finally, how these skills would help them in their future careers. Class discussions were also held after cases 1, 2, 4, 5 and 6 to help the students reflect on the purpose of the intervention activity, and to discern if the students felt like the activity was accomplishing these objectives. Students were provided an Excel workbook with instructions and were asked to calculate a materiality limit for three separate scenarios based on a method for calculating materiality commonly used in the profession.

Microsoft Excel
Students were provided an Excel workbook with instructions and were asked to perform vertical and horizontal analytics on the Balance Sheet and Income Statement. Then students were asked to analyze the calculations for trends and outliers above predetermined thresholds to draft tick marks designed to develop the audit strategy based on their analysis of risk indicated in the data.

Control Findings Memo
Microsoft Word Students were presented with three different scenarios and asked to rate the significance of the internal control deficiencies and draft a professional memo to the client discussing the findings and their recommendations to improve internal controls.

Cash Audit Microsoft Excel
Students were provided an Excel workbook with instructions and supporting documents to audit cash accounts. They were asked to perform an audit of the bank reconciliations for each of the cash accounts of the client by analyzing supporting documentation, and to prepare a conclusion of their findings, along with proposed journal entries, to correct any errors discovered.

Microsoft Excel
Students were provided an Excel workbook with instructions and supporting documents to audit accounts receivable confirmations. They were asked to analyze confirmation responses for accounts receivable, then calculate a total projected error based on the errors discovered in the sample and propose a journal entry to correct the errors.
6 Accounts Payable Confirmations Audit

Microsoft Excel
Students were provided an Excel workbook with instructions and supporting documents to audit accounts payable confirmations. They were asked to analyze confirmation responses for accounts payable, then calculate a total projected error based on the errors discovered in the sample and propose a journal entry to correct the errors.
During each intervention activity, students in the treatment group were individually shown a pair of audit workpapers through the ACJ software (CompareAssess). Students viewed these pairs of workpapers and selected the item (audit workpaper) they believed was "better" ISSN 2330-9709 2022 between the two items displayed. This decision was to be guided by their understanding of the correct answer (e.g., from the instruction provided by the instructor) and the professional formatting standards included on the assignment rubrics. Students were also prompted to provide a brief comment as to why they chose the "better" item for each decision they made (see Figure 1).

Journal of Education and Training
Each student compared approximately 12 pairs of items per ACJ session and completed the ACJ comparisons in less than 15 minutes on average. Two accounting professors also independently completed the evaluation process as a secondary check to ensure the perceptions of quality by the students were appropriate and aligned with accepted standards. In the class period following ACJ sessions 2, 4, 5, and 6, the course professor showed the class the anonymized ranking statistics, as well as any outlier statistics, and a visual of the highest ranked case. The purpose of displaying the highest ranked case was to provide an additional visual example, beyond the suggested solution provided by the professor in their review, of what a good example of the audit workpaper in the context of each case should look like (see Appendix A).
In addition to the ranking activity described above, cases were also manually graded by the course professor as separate validity check on the findings obtained. Instructions and a grading rubric were provided to the students to help them prepare each case and the same rubric was provided to the students for the ACJ ranking sessions to assist in their comparison. Additionally, the cases were graded for points before the professor participated in the ACJ session and the overall ranking results from the ACJ software were completed in order to create independent comparison points and mitigate the potential for bias in grading. The resulting statistics derived from the ACJ sessions (not from student grades) were used to investigate the guiding research question related to the impact on student performance of the learning by evaluating intervention.

Measures
The outcome of an ACJ session consists of several data points including: a rank order of all items with parameter values for the items, misfit statistics for judges and items, and comments collected in conjunction with each decision. Of note, the parameter values, which were used in this research, provide insight over the rank alone as they signify both the ranking and the magnitude of difference between items (Pollitt, 2012)they thus become a useful tool for analysis, interpretation, and investigation.
Misfit statistics are valuable as they indicate the relative agreement between judges and the relative agreement on the placement of each item in the overall rank. A judge with a large misfit statistic consistently makes comparative decisions contrary to those of their peers. An item with a large significant misfit statistic represents a "controversial" item-one that is ranked highly by some judges and very low by others. These misfit statistics were used in this research to investigate relative consistency among the student judges. Beyond the ACJ statistics, we created a pre/post survey based on Edmondson's (1999) Team Psychological Safety scale (Edmondson, 1999(Edmondson, , 2019; see Appendix B). The 7 items were left unchanged in wording. All 7 items were based on a 5-point Likert scale ranging from strongly disagree to strongly agree. In addition to the survey, additional items were included in the post survey to collect student reflections on the use of ACJ in their learning experience over the course of the semester. In total, there were 7 items focused on psychological safety in the pre-and post-surveys with an additional 6 items aimed at the learning experience with ACJ in the post survey. We also collected personal information regarding gender and year in school but did not have participants include any other identifying information such as name or email, thus making the reporting as anonymous as possible for both pre-and post-surveys. These anonymity measures, guided by our IRB approval, created a limitation in our statistical analysis in that we could not perform a simple repeated measures analysis. Therefore, a more complex statistical method-a Cumulative Link Mixed Model-was needed to understand the average change(s) in responses over time.
The surveys were administered via Google Forms and students were instructed to take the assessment outside of class by the instructor. Points were given for those students who reported completing the surveys. The pre-survey was administered at the beginning of the semester. The case studies began during week four of the sixteen-week semester. The post survey was administered after case 6, which was during week eleven of the semester. ISSN 2330-9709 2022

Differences in Parameter Values Across Groups and Time
To test our first hypothesis, we developed a mixed effects model with repeated measures on student groups comparing parameter values for the ACJ-participating and non-participating groups. We created the main effect of Case (signifying which case study was being performed) and another main effect of ACJ Group (signifying which group of students was participating in the ACJ for that particular case). In our study the main effects of ACJ Group and Case were not meaningful effects on their own; the effect of interest was their interaction. Lastly, a Kenward-Rogers approximation was used for testing the main effects in the mixed effects model we developed.
The main effect of Case was not statistically significant F(5, 32.1) = 0.17, p = 0.97; neither was the main effect of ACJ Group, F(1, 16.7) = 0.53, p = 0.48. The interaction of these two factors was also not significant F(5, 32.1) = 0.57, p = 0.72. Ultimately, there was no statistically significant difference in performance between groups and their use of ACJ in the course thus leading us to reject our first hypothesis.

Differences in Psychological Safety Ratings Over Time
To test our second hypothesis, quantitative data was extracted from survey items 1-7 on both pre-and post-surveys through an analysis of the 5-point Likert scale. A basic two-way ANOVA looking at gender -being the main classification for subjects without other identifiers present-and test (pre-or post) as independent variables with the score as our dependent variable. Results showed an interaction between male and female responses from pre-to post, but this interaction was not statistically significant; males and females did not ISSN 2330-9709 2022 have significantly different average scores (on average between pre-and post). However, on average, scores increased across genders from pre-to post. These increases are generalized to the group as student responses were not paired from pre-to post so group responses were combined.

Journal of Education and Training
A Cumulative Link Mixed Model (CLMM) was used to investigate the impact of individual items and their average change over time. Though it is not as powerful as that of the ANOVA, this model allowed us to have more resolution for the data set we collected. For example, items 2,4,5,6,7 were affected the least from pre-to post. Conversely, items 1 (If you make a mistake on this team, it is often held against you) and 3 (People on this team sometimes reject others for being different) were affected the most and our analysis revealed that psychological safety scores on items 1 and 3 were significantly improved for all students from pre-to post. This held true regardless of gender with a higher postsurvey value than presurvey for all participants. Additionally, further investigation revealed that, on average, there were greater odds of males giving a higher score on item 5 (It is difficult to ask other members of this team for help) on the postsurvey relative to the presurvey than females.

Student Comments on ACJ Experience
On the post survey, an additional six items (Items 8-13) were included to have students provide feedback and reflect on their experience in using Adaptive Comparative Judgment. Each student response was coded holistically (e.g., if a student had several sentences in agreement with ACJ as an effective learning tool, the comment was coded as "Yes" or "Like") and by sentence (e.g., students provided reasons for why ACJ was an effective learning tool, and each reason or sentence was marked and tallied). In the following report of student responses, the term Holistic Responses refers to the entire answer provided to the question whereas Evidence Statements refers to mentions or sentences within the student responses. A student's single response often made many statements (see Table 4 for an example of how student responses were coded). Table 5 displays the item questions, total counts among holistic response and evidence statement codes, and provides a connected student comment to provide context for the coding. ISSN 2330-9709 2022

Program interface (Like)
The total number of students that voluntarily completed the final survey was twenty-six (out of 32 total students), therefore all the following descriptive statistics, unless otherwise noted, are derived from that result. Percentages were rounded to the nearest whole number.
Item 8: How would you describe your experience using CompareAssess? Why?. There were ISSN 2330-9709 2022 three holistic response codes: positive, negative, or both. About 46% (12 of 26) of student responses were coded as positive; with nearly 15% coded as negative. The holistic response code, both, accounted for around 38% of the total responses. Two student comments were coded holistically (positive, negative, or both), but did not offer any further evidence statements (they did not answer the question "why"). Nearly half (46.15%) of the negative evidence statements from student comments focused on the difficulty of the program interface while nearly 38% of the positive evidence statements from student comments described how "easy" the program was to use.

Journal of Education and Training
Item 9: What did you like and not like about CompareAssess judgments? There were four holistic response codes: like, dislike, both like and dislike, and no indication. Student comments that were coded as both like and dislike contained evidence statements of like and dislike, whereas student comments holistically coded as like, for example, only accounted for reasons they liked the intervention. Of total student comments, over 57% (15 of 26) of them were holistically coded as dislike. Again, 15 of the 22 (about 68%) evidence statements for why the students did not like their experience were due to the program interface. Holistic responses of like and both like and dislike accounted for about 19% each while one student made no indication as to what they liked or did not like.

Item 10: Do you feel that CompareAssess was an effective learning tool? Why or why not?
There were three holistic response codes: yes, no, and maybe. These three parent codes answered the first part of the question, whereas the evidence statements were used to answer the second (i.e., why or why not?). Nearly 69% of the student comments were coded holistically as yes comments. About 19% were no and just over 11% were coded as maybe. The total evidence statements for yes were 21 whereas only 5 evidence statements were associated with the no holistic response code. A large number (8 of 21; 38%) of the yes-associated evidence statements focused on the fact that the CompareAssess intervention was a "glimpse of real life." Others (5 of 21; about 24%) found value in the act of comparing.
Item 11: Do you see any potential value in using comparative judgment practices in your future career? Why or why not?. Much like Item 10, there were three holistic codes of yes, no, and maybe. The majority of students (65%) were coded holistically as yes, in that they did see potential value in using comparative judgment in their career. However, just over 23% of student comments were coded as maybe. Much of the associated maybe evidence statements highlighted that the comparative judgment's use in a future career depended on the career, the setting, and other factors. Such variance demonstrates that it was not distinctly yes or no for these students. Nearly 65% of the yes associated evidence statements mentioned "skill building" as why comparative judgment had value to them.
Item 12: As you made judgments, did you feel you learned anything? What specifically?. The three holistic response codes were yes, no, and maybe. The majority of student comments (over 69%) were holistically coded as yes with 23% were coded as no. There were 18 total evidence statements associated with the yes holistic response code. In half of those evidence statements, students reported that they had learned aesthetics (i.e., how to make a report "look good" or presentable) while only 3 students specifically mentioned the material or subject Journal of Education and Training ISSN 2330-9709 2022. A large number of students provided no evidence statements (nearly 67%) for why they did not feel they learned anything as they made judgments. Around 33% stated that they already knew the material before doing comparative judgment and thus, did not learn anything new.
Item 13: What was the basis for your judgment decisions? What formed the criteria for your decision to choose one item over another?. There were six holistic codes: accuracy, appearance, accuracy and appearance, self-reference, other, and none. Nearly half (46%) of the students reported that they looked for both accuracy-meaning, the documentation was completed correctly-and appearance-meaning, the documentation was well formatted. Over 15% of students only judged on appearance while another 23% only judged on accuracy. Interestingly, almost 8% of students reported using their own documents as a reference for their judgment decisions (i.e., "Was it better than mine" or "Did they get the answers I got", etc.). Easy (5) It was easy to pick and choose and all the materials were included.

Similarity in Comparing (3)
It was interesting to have to compare audit reports that look very similar.

Self-Improvement (3)
Good to see the different way others presented it and it helped me identify how to make minor adjustments to make it look more professional.

Hands On (2)
Great, hands on experience.

Negative (4)
Difficult Program Interface (6) It was difficult to manage and view files on a single screen.
Similarity in Comparing (1) since all of our work papers were the same format I felt like it really didn't matter which one I picked.

I think the rubric could be made easier to follow
Long (2) I felt like some of them were long and I just tried to get it done.
Lacked Training (1) I wish I had a bit more training on how to compare reports.
Both (10) NA I thought that it was useful, but I think the rubric could be made easier to follow. ISSN 2330-9709 2022 Item 9: What did you like and not like about CompareAssess judgments?

Journal of Education and Training
Like (5) Comments (2) I liked that I could leave a brief explanation...

Program Interface (3)
I liked how smooth the system was, it made it very easy to actually compare and assess.
Comparing ( Added to the class (3) I believe it was an effective learning tool because it allowed us to learn beyond the actual cases and practice actual auditor duties.
Comparing (5) Yes, by placing 2 different cases together, it was easy to judge.

Self-Improvement
(2) I learned what details to add to make a work paper sound professional and ISSN 2330-9709 2022 stand out.
No (5) Program Interface (1) No, it made comparing the options much more difficult than looking at them separately.
Did not learn anything (1)

I don't feel like I learned anything from it
Similarity (1) It didn't feel like there was anything different about the papers we were judging.
No value in comparison (1) Comparing two entries on a purely objective basis was not particularly enlightening or intellectually stimulating Maybe (3) Glimpse of real life (1)

I feel that it was a very simple process. If that is how it is in real life scenarios then I think it was an effective learning tool. If it doesn't relate to real situations then I think that it was probably not useful.
Timing (1)

Comparing (1)
Not as effective, but it was a great experience to see how others approach their answers and all.
Item 11: Do you see any potential value in using comparative judgment practices in your future career? Why or why not?
Yes (17) Recognize (3) Yes, because audit is being able to recognize certain clues, and [CompareAssess] helps build those skills.

Reinforce (3)
I think it reinforced some of the keys to performing effective audits.

Skill building (11)
Yes, because I will be creating working papers of my own and potentially viewing others work.  ISSN 2330-9709 2022 Other Reference (1) NA Use of information I had learned. Guidance from my professor.

Journal of Education and Training
None (1) NA none

Discussion
Student performance on the specific assignments did not significantly improve throughout the course as a result of the ACJ intervention. However, there were several positive outcomes from implementing ACJ into the classroom experience. Students reported increased psychological safety in their class teams, particularly in their feelings that they could take risks and make mistakes in the presence of their team members (as reported in Item 1 of the survey) and that they would not be rejected because of their differences (as reported in Item 3 of the survey). However, an increase in psychological safety may or may not be a function of time. As evidenced by the survey results, students generally enjoyed the experience, and saw it as relevant to their future careers. Additionally, the instructor reported anecdotal evidence to support this point; while students attended a conference for recruiting mid-semester, visiting employers were excited that workpaper review practice was being added to the curriculum and this point was noted by both the students and the instructor. The ACJ intervention helped to facilitate engaging class discussion about preparing and reviewing audit workpapers as students were able to experience both aspects of the process. Based on student comments in the survey and in classroom discussion, students felt they were able to easily make connections to concepts taught in class and their application and importance to real-world practice.
There were a large number of student comments that pertained to the technical functioning and interface of the ACJ software. This study was an investigation into the process of peer feedback in accounting education and the influence this educational practice had on psychological safety and not into the technical mechanics used to convey that process. Therefore, a further treatment as to why there was a range of divergent views on the technical components of ACJ, is not within the scope of this paper. However, it can be noted that in this intervention, when there were limiting aspects or technical glitches to the process, students did report those issues over other elements of their learning experience.
Due to our methods in anonymous completion of the surveys, gender was one of the only demographic characteristics we received. Therefore, it became a default grouping variable for us to conduct our analysis. Even while the number of females outnumbered that of males, there was no statistically significant interaction for gender. Though, from the results, it may appear that males tended to score higher in psychological safety in the presurvey and females scored higher in the postsurvey, due to a lack of pairing, these findings need further investigation. During the presurvey, the answer "neutral" was used five times more than in the postsurvey. Though there is no statistical significance, we believe that there may be some practical significance contained in these findings. An example of such practicality may be that psychological safety is not easily diagnosed or attributed in a first impression but is developed over time and only over that period of time can there be certainty of exactly how ISSN 2330-9709 2022 psychologically safe a team or team member feels. Further research is needed in this area to conclude this with any statistical significance. Above all, introducing an intervention that included peer review and feedback and a best-to-worst ranking of student work, did not negatively impact student psychological safety.

Journal of Education and Training
Our results with the ACJ intervention are surprising, as much of the ACJ literature has shown some significant advantage to those who use ACJ in formative assessment over those who do not. We made several observations that may shed light on the disparity in our findings from the extant literature.

Dosage Effect
There is some uncertainty in the literature surrounding ACJ as to what role the dosage effect may have on student performance. The question as to whether a student does better on a certain assignment simply because they have been exposed to it multiple times is unclear but is inherently part of the positive findings in ACJ research. In our study, we had the students split that dosage from beginning to middle of the semester and middle to end. One may expect to observe that the students that utilized ACJ from beginning-to-middle of the semester would outperform their peers only until the other class members received their ACJ dosage. Ultimately, then, by the end of the semester all students would be performing at the same level. This finding was also not present. However, what seems to be the case is that there may have been another dosage effect in place: as students became more familiar with the subject matter and the audit documentation process, the effect of ACJ was lessened, thus showing no statistically significant improvement among or between groups over time.

Grading
Although we initially expected that points awarded for the grades would be well aligned with the ACJ rank (i.e., those who ranked the highest in the ACJ sessions would also receive the highest grade), we discovered some unexpected results due to the method that was used in grading by the professor. Rather than grading the cases on the standard bell curve, which would have ranked students across a spectrum, the principle of concept mastery was the guiding factor in grading since that most closely matches the requirements in the professional standards for workpaper preparation in auditing. Therefore, all groups that met the required criteria in the rubric could receive full points for the case for purposes of their grade, even if they achieved a lower rank in the ACJ activity. For the case studies that were more simple, this created a scenario where most of the groups mastered the required concepts and were able to earn full or nearly full points, even if they were listed near the end on the ACJ rank. Therefore, the highest ACJ rank did not always correlate with highest points for grading, and vice versa for the lowest ACJ rank. We confirmed that basing grades on ACJ rank is more closely aligned with grading methods that use a standard bell curve rather than grading methods that use concept mastery as the objective.

Professor Perspective
Although ACJ rank does not necessarily correlate with academic grading for case studies in an auditing course due to the factors mentioned in the previous paragraph, ACJ does appear to have real practical value in preparing students for careers in the auditing profession. As mentioned in the introduction, feedback from the profession indicates that one of the largest gaps between academic learning and preparation for required skills in the auditing profession is the aspect of workpaper preparation, documentation, and review. Additionally, student responses in survey items 10 and 11 indicated that the majority of students felt that this activity prepared them for the profession. Based on these student and professional responses, ACJ's primary value in auditing courses may not necessarily be found in improving academic grades, rather, it may lie in preparing students for future careers in the auditing profession by exposing them to the peer review and feedback processes required in the profession, and helping them begin to experience psychological safety and to develop the skills necessary to be successful in that area of daily professional work.

Future Research
As described above, ACJ has been implemented as an effective learning tool in settings ranging from elementary to university education. As students in an auditing course did not experience a greater improvement in grade performance relative to when traditional educational methods were used, we cannot conclude that ACJ is effective for significantly improving grades in the business education setting where spreadsheets are involved. However, before jumping to this conclusion we believe future research is warranted. Students reported enjoying the experience, and academic performance was in no way hindered by the ACJ experience. To our knowledge, this is the first attempt to use ACJ in business education with spreadsheets. With spreadsheets, in addition to elements of visual presentation, quantitative accuracy is also important. Future research could seek to disentangle these elements with respective to the effectiveness of ACJ as an educational tool. It could also be the case that the effectiveness of ACJ as a learning tool is a function of dosage or time. Perhaps students were not given sufficient experience using ACJ to translate to improved academic performance.

Conclusion
Audit documentation is vital to the auditing profession. To ensure documentation meets standards of sufficiency and appropriateness, the profession has employed the process of peer review and feedback. Audit team members may not only provide essential feedback relating to the documentation but also in regard to team member performance. Implementing peer review as an educational practice and learned outcome in audit education programs is therefore vital to preparing auditors who have the skills and experience to participate as an able-bodied peer reviewer.
As psychological safety becomes an increasingly researched phenomena (Edmondson, 2019), the role it plays in helping teams do their best work cannot be overstated. Auditors fresh out of higher education programs need to understand more than how to conduct an effective audit; they need to have had experience in developing and fostering psychological safety in teams. Failing to have cultivated the habit of speaking up or taking responsibility for mistakes, risks, or failures while also being able to be responsive and attentive to those who commit such errors or take such leaps of faith and fail, can be devastating to team and organizational performance. Thus, students must learn not only how to perform peer review, but also how to maintain healthy intergroup dynamics that make the peer review truly productive.
This intervention sought to combine the innovative approach of Adaptive Comparative Judgment-a new and validated form of peer review and feedback-and psychological safety to learn more about the relationship of intentional peer review practices in an educational setting. Much of our results with ACJ are insignificant, showing that such an intervention may not significantly improve our student's academic performance over others. However, the value of the intervention was not lost on the students; they saw the experience as a hands-on approach to learning skills that could help them better prepare for their future careers. Their perceptions were validated by recruiters and personnel from the field who were told of the intervention being done. This may be anecdotal, but often value is in the eye of the beholder. Students felt that such an intervention was important and were grateful to have engaged with an educational practice that they saw as beneficial and promotive of their career goals.
Psychological safety-particularly in areas of team acceptance while making mistakes and team acceptance for those seen as different-significantly increased during the time span of this intervention. In a westernized culture where comparison and ranking with a sort of didactic view (e.g., this student's work is better than another) could be seen as a negative approach in a classroom setting no such negative impact was revealed in our findings. Psychological safety scores, particularly those in connection with being able to take risks that may lead to mistakes and accepting others regardless of differences, significantly improved over time. Thus, it can be said that, in the case of this study, ACJ peer review and feedback did not hinder students' perceptions of being able to take interpersonal risks in their education and, may potentially help improve psychological safety. Thus, the ACJ process may enhance student experiences in learning auditing technique while also to some extent improving their capacity to provide and receive peer feedback through fostering and growing psychological safety. This should encourage instructors and researchers to continue to research and implement such interventions in their own disciplines.