In the first part of this series I set out the idea that many assessments used in university courses are no longer suitable because they either cannot guarantee that the work is a genuine reflection of the student’s learning or because they fail to take advantage of the opportunities provided by AI.
In the second part I identified three types of assessment that we use now, that I believe can still be an important part of our assessment mix if done correctly, namely in-person tests/exams, presentations and reflective work.
In this part I will detail, based on both my ideas and those of others, some assessment structures that I think we should now be using. In particular I am interested in assessments that replicate the process of research, reading and writing that is used in the production of essays and reports. I want students to have the benefits to learning that these processes bring, but potentially have them be even better than before with the enhancement of AI. These assessments must also preserves the integrity of the signal that a grade reflects.
In my reflection on the recent CTaLE event I attended, I argued that this tension between assessment that helps learning, and assessment that accurately reflects student ability was a feature of much of the thought. I believe most of us think there is a way to get both, and the assessments I will outline here aim to do just that.
Idea 1 - AI evaluation in a report
My first idea is one that I have already used for my introductory Microeconomics assessment. In 23-24 I was unhappy that too many students were using AI in their reports and so wanted to make a switch for 24-25. I was influenced in this design by Scott Cunningham and Alice Evans:
The central idea is to preserve the element where students do their own research and writing, whilst embracing AI as a tool to help them do research and also gain an appreciation the strengths and weaknesses of AI output. As this was an assessment for a year 1 semester 1 module, I felt this was particularly appropriate.
I switched to a report where rather than just answer the question themselves, students put the question to an AI and then evaluate the AI answer. They then use this understanding to put across their own answer.
Extra nuance is provided by giving the students two different AI answers to evaluate, one provided by a chatbot with no internet search or extra context, another produced by one that does have that extra context. Part of the assessment is to analyse the relative strengths and weaknesses of these two different type of AI answers.
Specifics of the assessment:
Pick a question from a list of possible options, all on big themes in the economy, housing, poverty, energy etc.
Part A (25 marks), put the question to an AI without any search capability. Think carefully about the wording of your prompt. Evaluate the answer to the question and explain your use of AI, why use that prompt, why use that model, could you have done something differently?
Part B (25 marks), as above but using Perplexity or Consensus that search the web/academic literature to answer the question.
Part C (50 marks), using the answers from parts A and B, and your own knowledge from the course, provide an answer to the question. For this part you should use suitable diagrams and some data1.
As an added element to encourage genuine work and avoid AI copy-paste output, there was an opportunity to do some of the work in class and submit drafts for feedback on parts A and B. The idea was that I could see them produce some of the work in-person, and that if the final work was then very different, this would trigger some alarms about possible AI-use.
Given that I actually used this assessment, how did it go?
On the positive side, I do think the students genuinely got a better appreciation for strengths and weaknesses in AI output and how different AI models compare, and this came across quite well in their work. Saying that they had to use diagrams in the final part also meant they had to do their own work to a significant degree (though this will not work for much longer).
On the negative side I still felt some of the writing was probably done with too much aid from AI. You can copy and paste the AI-output in parts A and B, put that into another AI for evaluation and get a passing answer probably.
I did try to counter this with the in-class writing and feedback on drafts, but left as it optional. Next time I would make this mandatory, though appreciate it is labour-intensive. This cohort was small, so easier to do stuff like this, for larger cohorts it would be difficult.
Overall, I like this approach, especially at the start of the year. It will need some tweaking, but I will use some version of this again.
Idea 2 - Improving AI output
The Deep Research capabilities of all leading AI models, change fundamentally what a research report looks like (in the final piece of this series I will include a link to the Deep Research report I ran on this topic of AI and assessment). These reports have the ability to bring together hundreds of sources and produce thousands of words discussing the topic you have set.
Much professional research work will use this function going forward, and they will continue to improve in quality. The ability to do research is a key skill that we assess in a degree, with a dissertation being the crowning achievement of many degree structures. This type of assessment is now totally changed, rather than scrap it altogether, how can we save it?
As with the ethos so far, I would want my students to embrace the use of AI in this process, it is what professionals will be doing, it allows you to be more efficient and potentially produce better quality, so let’s use this function, and test what students can add to the process.
The new research report assessment may look more like this:
For the given topic, generate a prompt that will get the best out of the deep research report. Select a model that is suitable for this work. Explain this process. Adding the right context to a good prompt is something that itself requires some domain knowledge, so this becomes both a test of the ability to use an AI, the ability to write well, a test of research skills (looking up prompts and prompting technique) and of knowledge of the specific area.
Share the chat with the generated research report to the academic. This is a feature of AI in education that I have not seen explore, but chats are shareable. Therefore if you are asking for an AI-output as part of the assessment, the student can provide the proof of how they got that output by sharing the chat.
The student will then work to improve the report. If we believe that (for now) an AI might produce a good report, but that a human can do better, let’s prove that! If the AI has got something wrong, then the student should spot this and fix it. If the writing is formulaic, the student can improve on it. If sources are wrongly attributed, get them right.
This could be complemented by a reflective piece explaining what those improvements are and why they were made. This would include discussion of where no changes are made, showing an awareness of where the AI did a good job.
A variant of this could be that the academic has already generated a deep research report, and students work with this template to improve it. It could also act as the basis for a test or exam to show their understanding of the topic. Potentially the entire assessment structure of a course could be built around this report and the degree to which it shows an understanding of the material from that module.
I think this is a nice assessment because it starts from a baseline that the AI will do a good job of this work, but that it is possible for a good student to make it better, even if this is minor tweaks. Much knowledge work in the future will likely take this form, working with AI produced outputs to improve them, rather than producing something entirely from scratch. It also allows students, with AI collaboration, to produce their own work to a high standard.
Idea 3 - The AI-augmented exam
I previously argued that I see exams as a legitimate part of the assessment mix at a programme level. A common criticism of exams is that they do not look like real work, and this difference becomes even more start in the AI age. We want exams more than before in order to preserve the signal of student ability, but the tasks we give in the exam look less and less like anything anyone does in the real world.
Is there a way to have some of both? Here is an idea for doing just that.
This exam would take place in computer lab, with full access to the internet, including AI tools. Students would have a project or question set to complete in a time limited fashion, as per a standard exam. The twist is that key documents needed to complete the task would be available only physically in the class. This could be a dataset or report the academic has constructed, or any piece of information or evidence that is needed to understand the problem. Crucially this would have to be something not available online or part of an AI training. The students can therefore only complete the project by understanding and using these physical documents alongside their AI and other digital tools.
This could take place in a single session, perhaps after a practice run with different documents. An alternative could be to turn this into more of a coursework, taking place across several weeks or months during class to build a piece of work. Each time students work with either the same set of physical resources in the room, or a set that varies across the weeks. Different degrees of guidance could be provided in these sessions, structuring how to produce the output, and this would also be well suited to groupwork.
An example in microeconomics might be that students are given a menu or catalogue with a set of prices and information about the products for sale. This would have been created by the module leader. The task is then to write a short report using microeconomic theory to analyse the set of prices. A well-designed menu/catalogue would include visual elements that (if we are still banning phones) could not be transmitted to the AI and must therefore be understood by the student based on their knowledge, even if AI can help on other aspects of the task. A similar task for macroeconomics might include a policy briefing with various data visualisations that serve the same purpose.
I believe an assessment of this type can help preserve the integrity of an assessment, whilst also capturing the realism of working on a project with AI alongside you.
With all these ideas, there is a need for more rigorous work to assess their strengths. However with the rapid advances in AI, and the threat of loss of legitimacy on university qualifications if we do nothing, we might justify taking a risk with new models such as these, rather than stick with obviously flawed ways of assessing.
More ideas will follow in later parts. Please share any ideas you have, or novel assessments you have implemented that deal with AI issues.
AI is still not great with diagrams for Economics models, so this helped to ‘AI-proof’ the assessment.