Everyone involved in academia should understand the impact that AI is having on assessment. This was the first major discussion within academia that was had after the launch of ChatGPT-3.5 in late 2022, and nearly three years later it remains a vital issue. In this series of articles I will discuss my view on what university assessment ought to look like in 2025 based on both what will work within the existing paradigm of higher education, and what might look different if and when that paradigm changes. These arguments are based on my own experiences in the last three years, conversations with many students and colleagues, and both the established theory and evidence of what and how assessment should be designed, and the growing set of theory and evidence specific to AI and assessment1.
I will begin this series by discussing the problems with current assessment, and how some of the earlier strategies for dealing with the proliferation of AI no longer work. These ideas are on my mind ahead of an event I will be attending next week discussing the future of economics assessments.
In the later parts of this series I will introduce some novel ideas about future assessments, and am sure this event will give me some good ideas to reflect on.
Being AI-proof
It is worth emphasising that no assessment is ‘AI-proof’, in that AI could help planning, brainstorming, finding useful information, grammar etc., all of which are things that current AI is already very good at, and which your students will be using. Some may be perturbed by this, I would encourage students to be doing these things, with some guidance on how to do them well, but in return we expect higher standards. Things that may have been shrugged off as small human errors before, could now be deemed unacceptable when we all have tools to avoid these simple mistakes. This is your judgement to make, and I will return later to the process of producing a final output, for now I will focus on that final piece of work.
‘AI-proof’ for my purposes here will mean avoiding assessments where students can directly copy and paste much of their answer from an AI chatbot, or other generative AI system.
I will also add that there are a great many types of assessment, and these differ substantially between disciplines. At the end of this series, I will discuss some more innovative ideas for assessments, but in my experience in Economics almost all assessments are either:
A test/exam (in a variety of settings e.g. closed vs open book; in-person vs online).
A written report/essay.
Solutions to a maths or statistics problem set.
A presentation of some form (could be spoken, or involve a poster, could be individual or group)
There are lots of slight tweaks to these, but a great many assessments are of this type across a great many disciplines, and all are, to a greater or less extent affected by AI-use from our students.
Finally, before getting into the details, I am principally talking about summative assessment here. Formative assessment is very important and I believe AI can be helpful in a variety of ways in this domain. In particular, if used correctly students now have access to unlimited AI generated formative assessments, marks and feedback to help their learning.
It is summative assessment that has people worried the most because a major function of university is to provide a signal of student ability. If use of AI allows weaker students to get the same marks as stronger students this is a big problem, and so it is here that we need to react in our assessment strategy.
AI defences that no longer work
A suitable starting point is to consider various ideas that were floated not so long ago, but now are largely irrelevant as part of our response to AI.
AI won’t do a good job of the work.
Some academics I have spoken to, appear to be stuck with an impression of AI capabilities that is seriously out of date. This is quite understandable, what other technology has changed so dramatically, so quickly? Many people are not using AI regularly, so will have little idea. Some may have heard some early stories about hallucinations or that it cannot add numbers properly (both still true in certain contexts) and assumed that this must mean the capabilities are generally poor.
However, the most powerful models, with some sensible use (detailed, iterated prompts, added context etc.) will deliver excellent results in most areas. These powerful models are available to everyone in some form (though with limits) and there will be newer models and capabilities soon. An AI chatbot may not produce the most outstanding answer to your assessment (yet) but it probably gets a pass mark.
If you aren’t sure, try it. Deep research capabilities now also allow excellent in-depth overviews of many topics that can stretch to thousands of words and dozens of sources. In the last part of this series I will include a link to a Deep Research report I ran on AI and assessment in Economics, it is 6000 words or so, has it’s flaws, but also has lots of excellent points, and all the press of a button. These AI-powered research processes can struggle to use the best sources that might be paywalled, but again, do not expect that to be a barrier for very long.
I can tell if the work is AI-produced
Once you have seen a lot of AI-output, there is a certain instinct for what looks AI-generated or not. However, some clever use of AI will make it very difficult to spot. ChatGPT-4.5 was released earlier this year specifically with the promise of more human-like writing and emotional intelligence.
We also have data that students are high adopters of AI. If you tell your students not to use it, some will listen, but many won’t, and it is very unlikely you will spot all of these attempts at cheating. Trusting/hoping that students won’t use AI to cheat on your assessment and/or that you will spot any such attempts does not seem a sustainable approach if you wish to protect the integrity of your assessment.
AI won’t help if I ask the right question
In 2023 tricks like using a very recent example, or a very bespoke case study for students to discuss in their essay may have worked because this information was not in the AI’s training run, and so it would struggle to give good answers. Now with search capable AI, and the ability to attach documents to give more context for your LLM answers, this is less of an effective approach.
Use a technical assessment that requires maths/stats/code/diagrams
Other than assessments that require the production of a physical output which are inherently more AI-proof, we might think we are safer if the assignment requires extensive use of maths, code or technical diagrams. For maths and code, AI excels in these areas: https://www.anthropic.com/news/claude-4
and so, these assignments are likely more prone to AI-aided submissions than a more text-based assignment. All of the leading AI models also have good data analytics built into them. At least for more basic elements of data management and analysis, this is a relatively simple task for a student with some AI-knowledge to accomplish, potentially in the space of minutes.
For technical diagrams, like those that I am very familiar with in economics, until very recently I thought it was a good strategy to emphasise these in a report or essay, but no longer. I will write more about how the production of such diagrams is close to being solved by leading AI models, but for now see some examples of AI outputs that could be used in an assignment:
Both of these are generate by a simple prompt in one of the newer iterations of ChatGPT (o4-mini for the first one, 4o for the second).
In summary, there are many types of assessment that are simply no longer viable as a means of assessment if you care at all about the integrity of that assessment i.e. that the marks the students receive bear any resemblance to their ability, both as a comparison within the group, or to other groups.
Any sort of take-home essay or report can have large parts produced wholesale and copy and pasted. More difficult and higher-level assessments might see worse responses, but that will not last long as capabilities continue to improve. Students who are better with AI will get better marks, leaving those less confident or those who want to play by rules you have set, unfairly penalised. The result will be an entire learning experience that feels pointless to a great number of students (given the high importance of signalling for university) and for many students, unfair.
This should be an unacceptable state of affairs to any academic at a personal level, and colleges and universities at an institutional level. What then should we do2?
In the next section I will discuss changes that can be made within the framework of existing assessments, but consider this to be inadequate as a response. We need new ideas for assessments and in the latter instalments of this series I will document some ideas, going into detail on some specifics of the implementation. Many academics will rightly believe that writing essays and reports are integral to getting students to think deeply about problems, I do not disagree with them, but the exact form of these assessments must change if they are to retain their legitimacy, and other newer forms of assessment can be brought in alongside them.
(Image deliberately inserted with error to show the continuing limitations of AI!)
I began to discuss these ideas last year in my ‘AI Manifesto’:
The Higher Education AI Manifesto
The 2024-25 academic year will be the third in which the wave of AI tools launched since ChatGPT3.5 was released in late 2022. Just in the last year there have been many new tools launched and other updates, most of which have implications for education.
Ethan Mollick wrote about some of these issues sometime ago now: