Towards Hybrid Language Assessment: Comparing The Role of AI with Teacher-Based Evaluation
Keywords:
AI-driven assessment, EFL assessment, Language proficiency testing, Perceptionsoflanguage assessment, Teacher-based evaluationAbstract
Assessment plays a central role in language education, as it provides a systematic basis for measuring learners’ proficiency, informing instructional decisions, and delivering constructive feedback. For decades, classroom assessment has been predominantly teacher-based, relying on educators’ professional expertise, contextual awareness, and interpretive judgment. Teachers evaluate not only linguistic accuracy but also coherence, creativity, and communicative effectiveness, often considering individual learner backgrounds. However, recent technological advancements—particularly in Artificial Intelligence (AI)—have introduced new possibilities for automating and enhancing evaluation processes, especially in productive skills such as writing and speaking. AI-driven assessment tools, supported by natural language processing (NLP) and machine learning algorithms, promise increased efficiency, speed, and scoring consistency. These systems can analyze large volumes of text within seconds, identify patterns in grammar and vocabulary use, and generate immediate feedback. Such capabilities make them particularly attractive in large-scale proficiency testing contexts. Nevertheless, ongoing debates question the reliability, validity, and fairness of AI-based assessment compared to traditional teacher-led evaluation. Concerns include the potential for algorithmic bias, limited sensitivity to cultural nuance, and difficulties in recognizing creativity or rhetorical sophistication. The present study investigated the role of AI-driven assessment in English language proficiency testing and compared its effectiveness with teacher-based evaluation. A comparative research design was employed, involving 41 university-level EFL students whose written tasks were evaluated by both AI tools and experienced teachers using an identical analytic rubric. To complement quantitative findings, questionnaires were administered to students and semi-structured interviews were conducted with teachers to explore their perceptions of both approaches. The findings revealed that teacher-based evaluations generally produced slightly higher scores, particularly in grammar accuracy and stylistic appropriateness. Meanwhile, AI-driven assessments demonstrated strong alignment with teachers in rating content relevance,coherence, and vocabulary range. Students appreciated the immediacy and consistency of AI feedback but continued to value the personalized explanations and encouragement provided by teachers. Similarly, instructors acknowledged AI’s efficiency and standardization benefits while emphasizing that human judgment remains essential for capturing subtle language nuances, contextual appropriateness, and creative expression.








