Use & Interpretation of SFQ Data

Academic departments and the Center for Language Education regularly use SFQ results to help them evaluate the teaching performance of their faculty and instructors. A paper entitled "Guidelines for the use of the Summative Student Evaluations of Teaching" was passed by the Committee of Teaching and Learning Quality (CTLQ) outlining 5 principles on the use of student feedback for teaching evaluation at HKUST. Below are the 5 principles quoted from the said document.

1. Student feedback and other forms of feedback
Students have a unique role in the evaluation of teaching performance, only students can report on the effectiveness of their own teachers in enabling them to achieve the required learning outcomes of the course. Evidence from students must always be as a core feature of any evaluation of teaching.

However, there are key aspects of teaching performance that cannot be evaluated only by students, in particular: the quality of course content and the relation of course content to program requirements; the academic standard of the material and related assessment; contributions to the collective effort to improve program quality; and professional activities related to teaching and learning.

Student evaluation of teaching should always be complemented by other sources of evidence of teaching performance.

2. Questionnaire-based scores in evaluation of performance
Well designed and well implemented questionnaires grounded in research about the characteristics of good teaching have been shown to provide reliable and valid evidence about the effectiveness of teaching from the perspective of students.

However, research has also shown that student evaluation through questionnaires is liable to bias and to manipulation by teachers gearing their teaching to gaining high scores on evaluations.

Evidence from student evaluation of teaching can be used with confidence to identify very good teaching and poor teaching performance from the perspective of students. Fine judgments based on small variances in scores cannot be made with confidence.

3. Limits of quantitative feedback
Student evaluation questionnaires provide a simple, comprehensive form of feedback on teaching, allowing for comparisons across teachers. But students' experience of learning is individual, and the range of approaches to teaching is wide. Simple instruments based on standardized, quantifiable criteria cannot be effective in all circumstances.

Student evaluation of teaching through questionnaires should be complemented by other sources of feedback from students that take into account the range of teaching and learning environments and that provide qualitative feedback with richness and depth.

4. Summative and formative feedback
End-of session evaluations reported after grades are the accepted model for "summative" evaluation by students. However, this model does not allow for mid-session formative feedback.

Student evaluation of teaching through end-of-session questionnaires should be complemented by mid-session formative feedback on teaching performance.

5. Evaluation of courses
Students' evaluation of teaching performance and students' response to their experience of the course or their program cannot be easily disentangled. Where a course or program requires review or decisions are to be made to adjust the design of a course or program, other feedback tools are called for.

Student feedback to improve courses and programs should be undertaken through a process that is fit-for purpose.

In addition to the above principles, some suggestions on the presentation and interpretation of SFQ results are outlined below. These suggestions are based on findings from research studies on the use of student ratings (Abrami, 2001; Franklin, 2001; Neumann, 2000).

A. Spread of scores within a section

Besides paying attention to the value of the mean score for each question in the SFQ report, attention should also be paid to the standard deviation (SD) and the distribution of students' responses. The higher the SD for a particular question, the more diverse are students' views about that question. Hence, the mean would not represent the view of the average student in that section class. According to Theal and Franklin (1991), an SD of 1.2 is high for a 5-point scale ^Note. This can also be easily checked against the distribution chart of responses for that question, which is included in the SFQ instructor report. Such disparate views about the course may be related to the diversity in students' background.

B. Accuracy in SFQ results

The mean scores that appear in the SFQ reports are not error free. This is true for all kinds of educational and psychological measurement and SFQ is no exception. There is always a margin of error in them. Hence, a small difference (say, less than 5 out of 100) between scores in a SFQ survey is often not significant and probably a result of random error created in the measurement process. It is possible to estimate the error if the standard deviation of the score distribution and the number of students evaluating a section are known.

C. Combining SFQ results from multiple sections

SFQ results for a single section can sometimes be influenced by factors beyond the control of the instructors. For a more accurate assessment of an instructor's teaching performance, it is suggested that the average of the SFQ scores from multiple sections/terms taught by the same instructor should be used instead. Research studies (Gillmore, Kane, & Naccarato, 1978; Smith, 1979a; Smith, 1979b) lend support to this practice. This is especially important if the number of students providing feedback for the section is small, say 10 or less.

D. Comparison of SFQ scores

Decades of research in student ratings repeatedly shows that student ratings are affected by (i) discipline; (ii) class size; and (iii) level of courses (Neumann, 2000). Hence, comparing ratings from vastly different disciplines can be problematic. Research also indicates that the relationship between class size and ratings is not linear (Lewis, 1991; Theall & Franklin, 1991). Sections with enrolment between 35 - 100 would on average receive lower ratings than others. As for course level, PG courses are generally rated higher than UG courses. Hence if comparison is to be made between faculty and instructors' performance, then the above three aspects of the courses should be taken into account. AQA has prepared an "SFQ University Summary Report - Breakdown by Level, Department and Class size" on SFQ Survey Results, which should provide a more meaningful basis for comparison of SFQ scores.

E. Trends and clusters

It is sometimes insightful to compare the student ratings of the same course taught in different years. Sometimes by examining all questions with higher (or lower) ratings, a pattern can be found which can provide insight into the strengths and weaknesses of one's teaching.

F. Interpreting students' comments

Interpretation of student comments - In average situation (i.e. not excellent or poor teaching) students who are either very positive or very negative about the course are more likely to answer the questions, hence their views should not be taken as representative of the class, nor should they be ignored. Comments that are back up by examples or contain details about the relevant learning experience would be more useful.

G. Response rate

Response rate is also a factor to be considered in interpreting SFQ results. A higher response rate is needed for smaller classes for the results to be considered reliable. At HKUST, for sections with enrolment less than 10, 100% response rate is required. For sections with enrolment above 100, 30% response rate is already acceptable.

Note: This would be equivalent to an SD of 24 for SFQ, which has a full score of 100.

References:

Abrami, P.C. (2001). Improving Judgments About Teaching Effectiveness Using Teacher Rating Forms. New Directions for Institutional Research, 59-87.

Franklin, J. (2001). Interpreting the Numbers: Using a Narrative To Help Others Read Student Evaluations of Your Teaching Accurately. New Directions for Teaching and Learning, 87, 85-100.

Gillmore, G. M., Kane, M. T., & Naccarato, R. W. (1978). The Generalizability of Student Ratings of Instruction: Estimation of the Teacher and Course Components. Journal of Educational Measurement, 15(1), 1–13.

Lewis, K.G. (1991), Gathering data for the improvement of teaching: What do i need and how do i get it?. New Directions for Teaching and Learning, 65-82.

Neumann, R. (2000). Communicating Student Evaluation of Teaching Results: Rating Interpretation Guides (RIGs), Assessment & Evaluation in Higher Education, 25(2), 121-134.

Smith, P. L. (1979a). The generalizability of student ratings of courses: Asking the right questions. Journal of Educational Measurement, 16(2), 77–87.

Smith, P.L. (1979b). The stability of teacher performance in the same course over time. Research in Higher Education, 11, 153–165.

Theall, M., & Franklin, J. (1991). Using Student Ratings for Teaching Improvement. New Directions for Teaching and Learning, 48, 83-96.