2603.01910v1 Mar 02, 2026 cs.CL

SemEval-2026 Task 7: FLANS - 다양한 언어 및 문화를 포괄하는 일상 지식에 대한 개방형 소형 LLM을 활용한 RAG 시스템

FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures

L. Bogdanova

Citations: 0

h-index: 0

Natalia Amat Lefort

Citations: 0

h-index: 0

Flor Miriam Plaza-del-Arco

Citations: 542

h-index: 11

Shiran Sun

Citations: 9

h-index: 2

Lifeng Han

Citations: 7

h-index: 1

본 시스템 논문은 SemEval-2025 Task-7 "다양한 언어 및 문화를 포괄하는 일상 지식"에 대한 참여 내용을 설명합니다. 우리는 Track 1: 단답형 질문 (SAQ)과 Track 2: 객관식 질문 (MCQ)의 두 가지 하위 과제에 참여했습니다. 사용된 방법은 개방형 소형 LLM (OS-sLLM)을 활용한 검색 증강 생성 (RAG) 모델입니다. 본 과제에 더 적합하도록, 저희는 미리 준비한 키워드 목록을 사용하여 위키피디아 콘텐츠를 추출하여 자체적인 문화 인지 지식 베이스 (CulKBs)를 구축했습니다. 추출된 콘텐츠는 문화적 맥락을 반영한 위키 텍스트와 국가별 위키 요약으로 구성되었습니다. 또한, 로컬 CulKBs 외에도 DuckDuckGo를 통해 실시간 온라인 검색 결과를 통합하는 시스템도 구축했습니다. 더 나은 개인 정보 보호와 지속 가능성을 위해, Ollama 플랫폼에서 제공하는 개방형 소형 LLM (sLLM)을 사용하고자 했습니다. 개발한 프롬프트와 해당 프롬프트의 학습 과정을 공유하며, 테스트된 언어는 영어, 스페인어, 중국어입니다. 사용된 리소스와 코드는 https://github.com/aaronlifenghan/FLANS-2026 에서 확인할 수 있습니다.

Original Abstract

This system paper describes our participation in the SemEval-2025 Task-7 ``Everyday Knowledge Across Diverse Languages and Cultures''. We attended two subtasks, i.e., Track 1: Short Answer Questions (SAQ), and Track 2: Multiple-Choice Questions (MCQ). The methods we used are retrieval augmented generation (RAGs) with open-sourced smaller LLMs (OS-sLLMs). To better adapt to this shared task, we created our own culturally aware knowledge base (CulKBs) by extracting Wikipedia content using keyword lists we prepared. We extracted both culturally-aware wiki-text and country-specific wiki-summary. In addition to the local CulKBs, we also have one system integrating live online search output via DuckDuckGo. Towards better privacy and sustainability, we aimed to deploy smaller LLMs (sLLMs) that are open-sourced on the Ollama platform. We share the prompts we developed using refinement techniques and report the learning curve of such prompts. The tested languages are English, Spanish, and Chinese for both tracks. Our resources and codes are shared via https://github.com/aaronlifenghan/FLANS-2026

0 Citations

0 Influential

25.5 Altmetric

127.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!