Generative AI, ChatGPT and the Implications for Test Creation

Chat GPT has taken the education world by storm as educators scramble to manage concerns over cheating using generative AI. Worries over students using the program to generate human-like essays, short constructed responses, and, in some cases, perform analytic thinking, have caused teachers to widely condemn the program. However, after the dust has settled around public opinion on Chat GPT, it is clear that it has benefits and is here to stay.

Tools like generative AI have the power to change the landscape of education and may prove to be an asset for educators and students alike. Using generative AI to create leveled reading, develop writing prompts, and even generate assessment questions could make life easier for educators while also improving student learning outcomes.

How can AI tools be used in Assessment?

AI tools like Chat GPT can be used by assessment authors in various ways, depending on the specific needs of the assessment. Here are a few examples:

Generating test questions: Chat GPT and other generative AI software can be used to generate test questions for a variety of subjects and levels of difficulty. Assessment authors can input prompts or topics, and Chat GPT can use its language generation capabilities to produce questions that assess students’ understanding and knowledge.

Grading responses: AI can be used to score open-ended responses from students. By training the AI on a set of responses that have already been scored by human graders, it can learn to assign scores to new responses based on their similarity to the training set.

Creating adaptive assessment questions: Chat GPT can be used to create adaptive assessment experiences that adjust the difficulty of questions based on students’ responses. By analyzing students’ responses in real time, Chat GPT can generate questions that are appropriately challenging for each student.

Providing feedback: Chat GPT can be used to provide faster feedback to students on their responses. By analyzing the content and structure of students’ answers, Chat GPT can provide feedback that is specific, informative, and actionable.

The value of generative AI for creating assessment questions

Developing test questions is a time-consuming and tedious task for educators. It can be challenging to create enough questions to test students and creating a bank of questions to pull from for an exam takes time away from other tasks like planning, connecting with students, and developing quality content. Using generative AI to create assessment questions is an option that educators and test creators can use to save time and be more efficient.

Some ways that generative AI is valuable when creating assessment questions are:

Create massive amounts of content with ease

Perhaps the biggest benefit of using Chat GPT to generate assessment questions is the speed with which it can create massive amounts of questions. As an educator, you can give Chat GPT a copy of something you are working on and ask it to generate questions about the topic. You may also specify how many questions you would like and at what level. By doing this a teacher or test creator can develop hundreds of questions in a matter of minutes, a task that ordinarily would take time, effort, or money to accomplish at such a large scale.

Generative AI isn’t perfect and some questions may not be what the test creator was hoping for. When using Chat GPT to generate assessment questions it is still necessary to have a human look through and edit any questions that do not fit. The process of editing questions, though is typically quicker than generating questions from scratch.

Test question randomization

Test question randomization is the process of using a bank of questions or multiple banks of questions to pull from randomly when designing a test. Traditionally, this process was done manually, a teacher would create two or three versions of a test by moving questions around. This took an enormous amount of time, however, with modern testing software such as TAO testing, educators can seamlessly insert questions into the platform and the platform can randomize the order in which questions are given on a test. In addition to randomizing questions, digital testing platforms can also randomize answer selections.

All of this serves to improve test validity and reliability, reduce cheating, and improve overall test integrity. While Chat GPT could generate different forms of a test, that still takes personal effort to copy, paste, and print different test versions. Using a testing platform, like TAO testing, speeds this up and houses the entire process, from test development to grading all in one place.

Examples of Question Types

Chat GPT can develop questions for a wide variety of assessments or needs including:

Fact-based questions: Questions based on factual information. For example, “What is the largest planet in our solar system?”
Conceptual questions: Questions related to the understanding of concepts and principles. For example, “What is the difference between weather and climate?”
Analytical questions: Questions designed to analyze and interpret information. For example, “What are some potential causes of the current climate change crisis?”
Critical thinking questions: Questions to evaluate information and make judgments. For example, “Do you think that social media has a positive or negative impact on society? Explain your answer.”
Creative questions: Questions designed to generate unique and innovative ideas. For example, “What are some possible solutions to reduce plastic waste in our oceans?”
Scenario-based questions: These are questions that present a hypothetical scenario and ask the student to respond. For example, “You are a CEO of a company that has been accused of unethical practices. How would you handle the situation?”

Considering Psychometrics: Validity and Reliability with Chat GPT

Psychometrics is an essential aspect of creating effective assessment questions, as it involves designing questions that are reliable, valid, and fair for all test takers. AI-generated questions still need to be evaluated against psychometric principles to ensure that it meets the necessary standards.

One way to reconcile AI-generated content with psychometrics is to incorporate human review and quality control into the assessment process. Human experts in the relevant subject area and psyshometricians can evaluate the generated questions for validity, reliability, and fairness. They can also ensure that the questions align with the intended learning outcomes and are appropriate for the intended audience.

Another way to ensure the quality of AI-generated content is to use machine learning algorithms that are specifically designed to optimize psychometric properties, such as item response theory (IRT) models. These models can help identify items that are too difficult or too easy and can adjust the item difficulty based on the responses of test takers to ensure that the scores accurately reflect the test taker’s abilities.

The Bottom Line

Chat GPT offers immense potential to use AI for test item bank generation, saving test creators time, money, and effort. Using generative AI to develop tests in the classroom setting is a way to help educators create tests with many different levels without having to spend hours reworking different questions.

When paired with an online testing platform, like TAO test development can become highly customized to meet the needs of individual learners while improving test integrity through test question randomization. All of this leads to a testing experience that gets at the heart of what assessment is designed to do, inform educators about where a student is at, what steps to take next, and how to improve learning outcomes for students.

How can AI tools be used in Assessment?

The value of generative AI for creating assessment questions

Create massive amounts of content with ease

Test question randomization

Examples of Question Types

Considering Psychometrics: Validity and Reliability with Chat GPT

The Bottom Line

Related Articles

Why Digital Math Assessment Still Struggles to Capture Real Thinking

Why AI Misinformation Literacy Is Becoming a Core Skill

Formative vs Diagnostic Assessment: What’s the Difference?

Subscribe to Our Blog