-->

D-TECK Tool Assessment Chart

This tool assessment chart provides information on benchmarks and criteria that were taken into account in selecting tools used in D-TECK assessment reports.

Given the impact that an effective assessment tool can have on your organization’s continuity, this chart is intended to help you independently assess the psychometric qualities of the tests used for selection and development.

TEST RELIABILITY: TWO THINGS TO CONSIDER

Reliability: Test reliability measures the extent to which the instrument provides an accurate, consistent measurement of the construct it claims to measure. It is evaluated in part through stability over time and internal consistency.

1- Stability over time: is the score the same over time?

For an instrument to be considered reliable, scores cannot fluctuate significantly when a test is taken twice in a short period of time. For instance, if your weight on the scale fluctuates by five kilos in a given day, you may have serious reservations about the reliability of your scale. The same principle applies to scores from psychometric tools. To check whether the instrument is the same over time, you can look at the test-retest index. The following table provides benchmarks generally used by test designers. 

 

Test-retest index

(Correlation coefficient r)

< 0.70

Low

≥ 0.70

Satisfactory

 

 

2- Internal consistency: how consistent are items in the psychometric test?

When a tool is reliable, it has a high degree of consistency between test items, in other words, each statement measures the same thing (the same construct or sub-construct). Statements must be similar and consistent in their content, since this makes it possible to ensure that they assess the same thing. Just as we expect that questions doctors ask allow them to assess your health rather than your finances, the items on a scale that assess a personality trait such as extraversion should not be coloured by things that measure cognitive aptitudes. Internal consistency is commonly measured by Cronbach’s alpha. A high degree of internal consistency indicates that items intended to assess the same concept generate similar scores. The following table presents criteria that indicate the internal consistency of the instrument.

 

Cronbach’s alpha

(α)

< 0.70

Low

≥ 0.70

Satisfactory


 

TEST VALIDITY: TWO THINGS TO CONSIDER

Validity: Test validity refers to the degree to which an instrument measures what it claims to measure. It also establishes conditions in which the score can be correctly used. For employee selection and development, two types of validity are particularly useful.

Predictive validity: does a test score predict a future event?

To determine whether the tool has strong predictive validity, the degree of association between the instrument’s results and future criteria is verified. In other words, predictive validity tells you whether the test used can predict a variable of interest, such as job performance, a reduction in the number of accidents in the workplace or reduced turnover. To determine whether the test has predictive validity, designers report the correlation coefficient between the test and the variable of interest. The closer the coefficient is to 1, the stronger the correlation. But since variables of interest are often complex phenomena influenced by other factors, it is virtually impossible to obtain a coefficient of 1. The following table helps interpret correlation coefficients.

 

Predictive validity index

(Correlation coefficient r)

< 0.30

Low

≥ 0.30 ˂ 0.50

Moderate

≥ 0.50

High

 

Concurrent validity: does the test score relate to a similar measurement?

Concurrent validity compares the tool’s results to existing criteria on the market, to assess its degree of association. Unlike predictive validity, concurrent validity does not verify whether the test predicts a future phenomenon, but verifies whether it is in line with available data. For example, we can measure the relationship between scores on student IQ tests and grades from their last transcript. This gives us information on the concurrent validity of the IQ test. To find out whether a tool has strong concurrent validity, we look at the correlation coefficient. Concurrent validity can be convergent, for example, when trying to verify the relationship between two tests that measure extraversion. These two tests measure the same phenomena, so, if the test is valid, there should be similar scores for a given candidate. There should be a positive correlation between these tests. Concurrent validity can also be divergent, for example, when trying to verify the relationship between the two tests, one measuring extraversion and the other measuring introversion. These tests measure opposite phenomena, so if the test is valid, a given candidate should have opposite scores. A negative correlation between these tests is what we are after. The following table provides criteria generally used by test designers for interpreting the correlation coefficient.  

 

Concurrent validity index

(Correlation coefficient r)

Convergent concurrent validity

Between 0 and 1 according to hypotheses advanced by researchers

Divergent concurrent validity

 Between 0 and -1 according to hypotheses advanced by researchers

 

A test’s validity depends on its reliability. A test can establish many significant links with variables of interest for organizations. However, if reliability indexes do not reach the abovementioned thresholds, the test cannot be considered valid. As a result, an instrument that has limited reliability will necessarily have limited validity. So you must be very careful when using it, particularly for selection.

When looking at psychometric qualities, it is essential to consider reliability indexes before validity indexes. Suppose, for example, that a test establishes a connection between height and job performance. This would mean that the taller the person, the better the performance. In this instance, it would be a good idea to include this assessment in your selection process. But if the instrument that measures height fluctuates by six inches for a given person on a given day, would you still want to include the test in your process?