Evaluation

1. Introduction
2.1. Context
Each year, Origin Learning produces over 5,700 hours of online learning for some of Australia’s largest organisations. A highly skilled and organised management team ensures projects are brought in to time and budgetary constraints. A team of 26 managers, instructional design leads, project managers and general managers will this quarter be participating in a management development program, specifically targeting the area of soft skills.
The program is entitled ‘Evolve’, the first module of which is People Skills. The objective of the module is to ensure managers demonstrate the necessary knowledge, skills and behaviours required to lead a group of instructional and graphic designers in the creation of eLearning. Assessment of the module will be conducted on site by a group of senior management consultants. The module will be assessed over the course of the quarter, providing opportunities for real world ‘on the job’ assessment of the desired behaviours required of managers by Origin Learning.
A plan has been designed to increase empathy, teamwork and harmony in what has traditionally been both a high pressure and turnover environment. The objective is not to deliver information to the managers, but to present a program that will challenge them to truly experience, learn and demonstrate through highly targeted and appropriate assessment, those behaviours conducive to creating understanding and empathy in a group dynamic.
Whilst the Origin Learning management team is well qualified, most in psychology or information technology, and creative and competent instructional designers, their experience in managing teams of people in the workplace is limited. Moreover, they have recently undergone the combined pressures of seven resignations and four human resource interventions which have inspired this program in what might be interpreted as a remedial measure. The People Skills module encompasses four behavioural objectives (see Appendix 1). They are Personal Communication, Group Work, Cultural Sensitivity and Whiteboard Skills. The stakeholders of this program have not been shy in asking for a program which will challenge their leaders to feel a level of dissonance during the program. They have asked for it to demand a high level or active participation and personal development from their managers. The assessment instruments have been developed with this brief firmly in mind.
2. Objective 1.1 - Personal Communication
2.1. Role Play - The Rationale
In this module, the first objective is to develop the manager’s confidence in personal communication. The assessment instrument has been chosen on the basis that if you want to determine if someone can communicate, have them communicate (Gronlund, 2009). In this case, the instrument in question is a role play between the manager and three other participants who will assess the encounter after the event. This instrument will directly assess that knowledge, skills and behaviours taught during the program, aligned to the behavioural objective. Importantly, the participants of this role play will be unknown to the manager, guarding against the greater confidence that comes with familiarity.
In choosing a role play to measure this objective, a distinction has been drawn between simply asking the manager to recall and supply the communication theory learned in the program, and demanding a more complex outcome such as understanding, applying, analysing and demonstrating. Beyond a simple selection type item such as multiple choice or a true/false exam, a role play asks the manager to draw on the exact performance detailed in the objective. The nature of the performance, given it is a higher order and more complex outcome means that many factors can be assumed to be present given a successful performance. For example, a sound performance in the role play will indicate an underlying factual knowledge, recall and understanding of the required performance. This form of instrument also encourages complete originality and creativity, as the manager is given a topic to discuss, but no prescriptive course for it to take.
There are many positive attributes to role play assessments. For example, it provides an exceptional opportunity to observe the managers actively performing a task, as opposed to speculating or inferring outcomes from a written test. Moreover, it provides the managers with a meaningful purpose for the assessment method, unlike other forms of assessment (Chase 2005; Kvale 2007). This level of authenticity may build confidence and motivation in the manager, as they progress to becoming more self-confident in all situations. In addition, the managers are able to be closely observed in both performance and problem solving in real time (Gronlund, 2009). The design of this assessment also gives the assessors the ability to draw their conclusions in a more balanced and impartial way, given there are four assessors present (three participants, one observer).
Whilst being very comfortable that this instrument is the appropriate choice for this application, there are a number of limitations inherent in the process. Performance testing and specifically role plays may be subject to weaknesses in that they will invariably be inconsistent over repeated attempts, are unarguably expensive in both time and resources (McCurry, 1992) and typically have low reliability (Gronlund, 2009). In accordance with the initial brief however, this instrument has been designed to present a significant challenge to the managers, all of whom are accustomed to intense performance pressure. Having to role play with four assessors present should therefore provide the requisite environment for increased reliability, consistency and growth.
2.2. Role Play- The Appraisal
As a concept, validity can be divided into several components. For the purposes of this instrument, the most relevant are content validity and construct validity. In short, does this role play reflect the workplace skill of confidence that is being assessed (construct validity) and is what has been taught in the Evolve program reflected in the assessment instrument (content validity)?
Role play assessments are widely held to be one of the most authentic and realistic ways to assess real world skills, and are uniquely capable of providing insight into communication skills (McCurry, 1992). This is evidenced in their ubiquity across the VET sector for a broad range of qualifications and courses. The workplace skill required in this case is confidence in communication. The role play almost perfectly replicates a situation in which the manager’s confidence will be challenged and tested. It is a highly relevant instrument given the day to day requirements asked of the managers.
A valid inference could be presented that to do well in this role play, being asked to enter a room full of strangers and create and sustain a conversation for ten minutes whilst being assessed is to more than adequately prove a level of confidence acceptable for this purpose. The Evolve program itself utilises the role play instrument to build skills in the managers. Given the assessment instrument is also a role play would answer the question of content validity soundly (Gronlund 2009).
In 1997, Griffin pointed to the fact that assessment instruments create only inferences that serve as a pointer or guide, and this is very true, particularly of written instruments. For behavioural objective 1.1 however, the margin for error or guesswork in making valid inferences is diminished. The tasks are set in the workplace itself, replicate the exact skills required, relate directly to the outcomes being assessed and are both appropriate to and typical of the industry (Booth R, et al 2002).
There is a very tangible risk that the role play instrument may be subject to errors. These risks exist not only due to the nature inherent in role plays themselves, but are magnified due to the amorphous human nature. Personal bias for example may be a factor, or the unpredictability of factors such as moods, emotions and personal situations. The assessors may be biased against a manner of communication which, whilst perfectly acceptable to those under 30 years old, causes consternation to those over 50, for example (Knight, 2006).
How reliable is this role play? Would the inferences drawn be of the same quality if the location, time or assessors changed? These considerations must not be weighed solely after the fact, but must represent an integral part of the design phase before the instrument is created. In this case, the limitations of role plays were assessed and counter measures were implemented to ensure reliability and validity were maximised. The limitation of insufficiency was dealt with by simply expanding the opportunity for the appropriate attributes to be displayed during the ten minute assessment. This serves to ensure the manager has ample opportunity to demonstrate confidence to the assessors (Gronlund, 2009). The plan, objectives and instrument were designed holistically to guard against poor structure, criteria or scoring (Gulikers, 2004).
Turning to the scoring rubric, care and attention has been made to ensure guidelines are clear and unambiguous and that personal bias is minimised. In stating this, it was felt that in this case there was ample opportunity for assessors to be swayed by a personal feeling of accord, affinity or even personality clash with the manager being assessed. The remedy for this was to introduce four assessors to the process, such was the importance placed on this assessment by Origin Learning stakeholders.
The above risks can be managed, and in this case have been to the extent allowed given the context and resources available.
3. Objective 1.2 - Group Work
3.1. Observation - The Rationale
In module 1.2, the objective is to assess the manager’s active participation in group sessions, along with their usage of effective and appropriate team work skills. The observation instrument has been chosen in consideration of the needs of the organisation, the assessors and the managers. Given the business and results orientated environment and the time constraints inherent in delivering projects on time and on budget, Origin Learning requested that the program be as holistic and time efficient as possible. Balancing this is the need to ensure the instruments used display a robust and appropriate methodology and that the learning outcomes are achieved. The observation allows managers to be assessed in the natural course of their business duties and in an authentic manner.
The assessor will simply attend three team meetings over the quarter and utilise a straightforward rubric to assess the level of participation and behaviours demonstrated during the course of the meeting. Many of the strengths and limitations of performance assessments such as role plays, observations and portfolios are shared and have been explored previously. There are some factors unique to each assessment however, and can be discussed in this context. In the case of this observation assessment, a single assessor will be present during the meeting.
To counter the exposure to a personal bias issue, an arrangement has been made that the assessors will share the attendance so that each manager will benefit from having three different assessors over the course of the quarter. In 2009, Gronlund warned of potential limitations being both a lack of systematic approach and poor record keeping. The structure of this assessment ensures a robust approach in both key areas in order to balance this tendency, with a dedicated support team member from Human Resources assigned to administer the process throughout the duration.
3.2. Observation - The Appraisal
The degree to which validity can be inferred from the results of the observation is high. Is it possible that in over three hours of observation in a group meeting situation to infer that the manager is actively participating and using appropriate skills? Absolutely it is. This method is perfectly suited due to the authenticity of the task being observed. The types of validity covered are content, concurrent, construct and face validity (Gronlund, 2009). The only limitation may be predictive validity, as the manager may be responding unwillingly to the perceived assessment pressure and return to being mute post-assessment. A sustained effort over three meetings duration however at least demonstrates a level of competence if not motivation.
A perfectly reliable assessment instrument is the goal, however may not ever actually be possible in the real world. In ensuring it is the best it can be however this observation and its associated forms (see Appendix 3) have been thoughtfully and carefully constructed from the outset. It is possible for example, that the reliability may be influenced by personal bias, the emotional state of the manager, their health, the general energy level of the meeting or indeed the time of day. These considerations, whilst noteworthy are less than threatening to the reliability of such a straightforward assessment. The most obviously variable factor needing to be addressed is that of personality, and in this case the assessors have been shuffled so as to not assess the same manager more than once in a quarter. Moreover, the assessment rubric features such open and clear instructions that the level of confidence in the reliability inferred is substantial.
An additional factor worthy of mention is that of the instruments flexibility. Hager, Athanasou and Gonczi asserted in 1994 that no matter how the skills have been acquired the manager should be free to demonstrate competence whether attained in the Evolve program or not. Furthermore, should this assessment need to be conducted at another time, in another place or by another assessor, opportunities to observe these skills will be plentiful (Rumsey, 1994). The competencies could even be demonstrated remotely through such tools as Skype or other video conferencing or even through audio tools only.
4. Objective 1.3 - Cultural Sensitivity
4.1. Essay - The Rationale
This instrument was chosen after careful deliberation due to the nature of the content, required behaviours and its importance to the Origin Learning program. Potentially, this objective posed the most problems and risks so various considerations were weighed before deciding on a written instrument, in this case a short essay. A practical assessment could have been simulated through a role play, however this was dismissed as being open to bluffing and disingenuous compliance. What is needed is an assessment based on higher order thinking, complex emotional empathic response and problem solving strategies (McCurry, 1992).
Of vital importance is the construction of the essay question, to avoid traditional limitations such as failure to understand the question, subjectivity of assessment and bluffing through a high level of writing proficiency. What is required and sought is a demonstration of why cultural issues are sensitive in the first instance and then how an individual might feel should they be subject to insensitivities. Until a manager grasps through felt understanding what it means, they will only be able to describe it in a general and remote way. This may also lead to the instrument being viewed as something to be endured or even resisted (Belfiore et al, 2004). To maximise the opportunity to truly experience this, the essay question, content, instructions and assessment have been devised to engage from the first person in a scenario, thereby delivering a highly subjective and personal viewpoint from which to experience the understanding of cultural diversity.
4.2. Essay - The Appraisal
In 2004 Kane asked two questions on the subject of validity: Are the results of the assessment of interest, and are they helpful in making good decisions? In this case, can we infer that the managers are indeed sensitive to cultural issues should they score well on this essay? Alternatives considered for this objective were avoided for various reasons. Initially, a performance type test such as a role play or observation was considered, however if a manager knows they are being assessed for cultural sensitivity, they will of course respond by curtailing their behaviour. This was considered to be undesirable given Origin’s commitment to and the prevalence of multicultural work groups.
The essay allows for the manager to demonstrate their awareness of the issues and challenges involved, if not their volition to be sensitive in practice. The content validity is therefore robust as is the face validity. As discussed previously, whether the concurrent, predictive and construct validity inferences are of a high degree is open to debate. Perhaps this instrument could be combined with that of the observation in assessing through the management consultants whether a manager shows sensitivity in group work, thereby providing a more holistic program.
Factors affecting the reliability of this instrument have been dealt with through the use of a uniformly quiet ‘retreat room’ for the managers and a carefully constructed assessment form for assessors. Given however that the factors affecting a strong performance in this assessment and the principles passed on in the Evolve program might be called ‘universal’ or ‘basic human right relations’, a high degree of reliability might be expected. For example, a manager will not usually forget or disregard their sensitivity in the morning only to re-embrace it in the afternoon.
An additional factor worthy of inclusion in relation to this assessment is efficiency. This instrument is hugely efficient for Origin Learning managers in that they may all be gathered at once, have the essay completed within two hours and requires only that they be marked by assessors within the human resources and training department. For busy managers, whose time is charged out to clients it is a highly efficient way to begin and end the process within a two hour period.
5. Objective 1.4 - Whiteboard Skills
5.1. Portfolio - The Rationale
This instrument asks the manager to record and present three 10 minute video vignettes of their own whiteboard sessions to a senior consultant for assessment. The brief from Origin Learning with regard to this item was extensive. Beyond a simple assessment of rudimentary whiteboard skills, they felt that as the organisation specializes in training and learning, the managers should be asked to engage deep meta-cognitive processes and self-analysis so as to scaffold their existing high standards to new levels. (Baume and Yorke, 2002). In 2007, Tayler proclaimed in a similar vein that it is a myth that intervention is only needed for struggling students. Given this, the portfolio instrument fits the requirement aptly.
Portfolios allow managers to show progress over time, compare best work to past work and develop self-assessment skills and reflective learning habits (Gronlund, 2009). Whiteboard skills are also indivisible from training skills so as the managers explore their video portfolios, they will unavoidably draw simultaneous conclusions about their roles as teachers and trainers.
The disadvantages accompanying such a process are that it will clearly take time to set up the recording, reviewing and presentation of the video for assessment. Clearly, repeated attempts may create inconsistent results. Flexibility may also suffer should the manager be ill or absent during the monthly meeting, as recreating the meeting for recording purposes is time and labour intensive (Gronlund, 2009).
5.2. Portfolio - The Appraisal
Having completed a folio of video vignettes and undergone the assessment interview can we assume the manager can demonstrate effective whiteboard skills? Given the highly subjective nature of the term ‘effective’ this requires exploration. In the Evolve program, the managers were taught appropriate methodologies and practices. These they are then expected to demonstrate in the video portfolio. Instead of taking an approach of competent or not competent, perhaps we might give them credit for what they do know and how far along their own developmental road they progress through self-evaluation (Masters, 1987, Messick, 1984).
Confidence in a high level of validity could be justifiably asserted in content, predictive, concurrent and face validity given the highly authentic and direct nature of this portfolio assessment. Further, this instrument fulfils the qualities Biggs described in 1992 in reflecting where managers stand in orderly development of competence, informing assessor and manager of what is needed to improve and importantly providing throughout the process a learning experience in itself.
The question of reliability or consistency of results in this case presents several considerations. Firstly, this is a highly subjective measure, and as such, it is difficult to predict complete reliability. Secondly, in each assessment portfolio there can and will be variation in samples. However, during the preparation of this assessment the learning outcomes were clarified and agreed, the assessment targets were cemented and the appropriate level of difficulty was ascertained in order to balance these factors and promote the highest degree of reliability possible given the context and resources available.
Another strength of this approach is its objectivity. Through empowering the manager to draw their own conclusions and self-evaluate their own performance, they are able to assume a higher level of responsibility for the outcome. The role of the assessor in this instrument is to assess in the most part the commitment of the manager to their own learning process, rather than judging them wrong or right in answering knowledge questions.
This instrument, whilst demanding of time and resources, is in alignment with both the desired outcomes of the Evolve program and the module outcome. In addition, it is pitched at the right level of difficulty in challenging already competent managers to grow through self-assessment and metacognitive analysis. It clearly provides the most direct method of performance required by the outcomes requested during the initial Origin Learning brief (Gronlund, 2009).
6. Conclusion
As assessment is a practical activity, there will always be the need to find a careful balance between competing and conflicting requirements. It is crucial to ensure that the process of analysing, designing and developing assessment instruments is done in such a way as to maximise efforts in weighing the strengths and weaknesses of various instruments, before turning attention to factors such as for example, validity, reliability, flexibility and efficiency. Compounding the pressure is the needs in many instances of those who are financing or supporting the program, and those who will eventually undertake the program itself. They are rarely the same group of people. It is for the above reasons and more that assessment and the evaluation of its instruments, their effect on participants and stakeholders remains a contentious and polarised pursuit.
The assessment instruments presented in this essay have been designed to as much as possible take into consideration what is best for the learner. Only when their needs were fulfilled were the needs of the peripheral stakeholders balanced and introduced to the process. At times, this meant a change in the design and execution of the instrument; however the commitment to the learner remained steadfast throughout. The pursuit of perfection in assessment may lead to disappointment, as it is a process of juggling demands. Commitment to practice however, as with juggling itself, leads to higher levels of both competence and with time, confidence in a superior result.
7. References
Baume, D., and Yorke, M. (2002) The reliability of assessment by portfolio on a course to develop and accredit teachers in higher education. Studies in Higher Education 27, no. 1: 7 – 25.
Belfiore, M. T., Defoe, S., Folinsbee, J., Hunter, J., Jackson, N.. (2004) Reading Work: Literacies in the new workplace. Mahwah, NJ.; Lawrence Erlbaum
Biggs, J. B. (1992). A qualitative approach to grading students. HERDSA News, 14, 3-6.
Booth, R., Clayton, B., House, R., and Roy, S. (2001) Maximising confidence in assessment decision-making: a springboard to quality in assessment. In: AVETRA 2001, 4th Annual Conference: "Research to reality: putting VET Research to work", 28-30 March 2001, Hilton Adelaide, Victoria Square, Adelaide, South Australia.
Chase, S. E. (2005) Narrative Inquiry: Multiple lenses, approaches, voices. In The Sage Handbook of Qualitative Research, ed. Denzin, N. and Lincoln, Y, 3rd ed., 651-80. London. Sage.
Griffin, P. (1997) Assessment in schools and workplace. Inaugural professorial lecture, University of Melbourne, September.
Gronlund, N. E. & Waugh, C. K. (2009). Assessment of student achievement (9th ed .). New Jersey: Merrill-Pearson.
Gulikers, J., Bastiaens, T., & Kirschner, P. (2004). A five-dimensional framework for authentic assessment. Educational Technology Research and Development, 52 (3), 67-85.
Hager, P., Athanasou J. & Gonczi, A. (1994) Assessment Technical Manual. DEET, Canberra: Australian Government Publishing Service
Kane, M. T. (2004). Certification Testing as an Illustration of Argument-Based Validation. Measurement, 2, 135-170.
Knight, P.T. (2006) The local practices of assessment. Assessment & Evaluation in Higher Education 31, no. 4: 435-52.
Kvale, S. (2007) Doing Interviews. London. Sage.
Masters, G. (1987). New views of student learning: Implications for educational measurement.
Research working paper 87.11. University of Melbourne: Centre for the Study of Higher
Education.
Messick, S. (1984). The psychology of educational measurement. Journal of Educational
Measurement, 21, 215-237.
McCurry, D. Ch. 11 “Assessing standards of competence”, pp 222-239 in Gonczi, A. (Ed.) (1992). Developing a Competent Workforce: Adult Learning Strategies for Vocational Education and Training. Adelaide: National Centre for Vocational Education Research (pp 306).
Rumsey, D. J. (1994) Assessment practical guide. Canberra : Australian Govt. Pub. Service.
Tayler, C. (2007). Challenges for early learning and schooling. Education, Science & the Future of Australia: A public seminar series on policy. University of Melbourne, Woodward Centre, 23 July
8. Appendices
1. Assessment Matrix – Evolve Program
2. Objective 1.1 – The Role Play & Instructions
3. Objective 1.2 – The Observation & Instructions
4. Objective 1.3 – The Essay & Instructions
5. Objective 1.4 – The Portfolio & Instructions
8.1. Appendix 1 – Assessment Matrix - Evolve Program

8.2. Appendix 2 - Objective 1.1 – The Role Play & Instructions

8.3. Appendix 3 -
Objective 1.2 – The Observation &
Instructions

8.4. Appendix 4 - Objective 1.3 – The Essay & Instructions

8.5. Appendix 5 - Objective 1.4 – The Portfolio & Instructions


