Measuring student achievement in a meaningful way is a difficult undertaking, and there is a whole field of study within psychology dedicated to the topic of this type of measurement: psychometrics. For an assessment in a specific area of study to fulfill its purpose, the score needs to reflect the level of knowledge or skill of the person taking the test. Several standards have been established in psychometrics for the purposes of educational evaluation to ensure the objective measurement of test takers’ levels, and one of these standards involves accuracy.
The term accuracy is key, as it serves as an umbrella term for two concepts which are central to assessment psychometrics: validity and reliability. Together, these two concepts serve as indicators for the quality and accuracy of data collection tools – in the world of education, this refers to the assessments used to collect test takers’ scores.
So what are validity and reliability?
Validity: In the context of educational assessments, a test is valid if it measures what it is designed to measure. This means that validity isn’t a property of the test itself; but refers to the degree to which a resulting score can be used to infer the level of the test taker.
Reliability: Whether a test is reliable, on the other hand, has nothing to do with its content or what it is designed to measure, but rather whether it measures whatever it is supposed to consistently. In other words, it refers to the degree to which scores from a particular test are consistent from one time to the next.
Of the two, validity is generally considered the most important for the quality and accuracy of the assessment, because it relates to the actual content of an assessment.
How do you know if a test is valid?
In a nutshell, establishing whether a test is valid or not involves finding evidence that links the (interpretation) of the test scores to the concepts the test is designed to measure. This evidence is taken from different sources and takes various forms, depending on the type of validity it is being used to support. There are three main types of validity which need to be considered for educational assessments, and there needs to be evidence of all of these before a test can be accepted as valid.
To establish whether your test is valid, ask yourself the following questions:
- What do you want to measure and does the assessment cover this? This is known as content validity.
- How well is the assessment measuring the content? This is known as criterion validity.
- Is it actually measuring the content (or something else)? This is known as construct validity.
If you can find evidence for all of these validity measures in the assessments you’re preparing for your students, you can consider them to be a valid method of testing your students’ knowledge.
How do you know if a test is reliable?
Once you’ve established that your assessment is valid, the next test is whether it’s found to be doing its job when used in different scenarios, for example with different groups, or over different points in time. This is the essence of reliability.
There are three ways a test can be examined for its reliability, and these can be addressed by posing the following questions:
- Are the results of your test replicable? In other words, are similar results achieved if a group of students takes the test twice? This is known as test-retest reliability.
- Are similar results achieved if similar assessments are taken within a short time? This refers to similarity between scores as well as positions, and is known as alternate form reliability.
- Is the test internally consistent? This measures how the content of an assessment works together to evaluate understanding of a concept, and is known as internal consistency reliability.
If you can find evidence for all of these reliability measures in the assessments you’re preparing for your students, you can consider them to be a reliable method of testing your students’ knowledge.
Together, validity and reliability make up the main considerations for judging whether an assessment provides an accurate measure of a test-taker’s knowledge or skills in a given subject area. A test is valid if the interpretation of a test-taker’s scores can be directly related to what the test is designed to measure, and it is reliable if this is the case over multiple applications of the test – both for different test-takers as well as for the same test-taker sitting the test at different times. In other words, these two concepts are essential considerations when you’re preparing tests, and it’s crucial to apply them as yardsticks for ensuring the quality of the assessments you’re using with your students.