auto_create_questionset

發表於 2024-12-28 更新於 2024-11-14 分類於模型評估

自動產生問題

如果您不想自己創建問題集來評估當前參數的性能，您可以使用 eval.auto_create_questionset 功能自動生成一個包含參考答案的問題集。隨後，您可以使用 eval.auto_evaluation 獲取評估指標，如 Bert_score、Rouge 和 LLM_score（對於問答問題集），以及單選問題集的正確率。這些分數範圍從 0 到 1，較高的值表示生成的回答與參考答案更接近。

如範例，以下創建了一個名為 ‘mic_essay.txt’ 的問題集文本文件，其中包含十個問題和參考答案。每個問題都是從 ‘doc/mic/‘ 目錄中給定文檔的內容段落中隨機生成的。然後，您可以使用該問題集文本文件來評估要測試的參數的性能。

import akasha.eval as eval

eva = eval.Model_Eval(question_style="essay", search_type='merge',\
      model="openai:gpt-3.5-turbo", embeddings="openai:text-embedding-ada-002",record_exp="exp_mic_auto_questionset")

eva.auto_create_questionset(doc_path="doc/mic/", question_num=10, output_file_path="questionset/mic_essay.txt")

bert_score, rouge, llm_score, tol_tokens = eva.auto_evaluation(questionset_file="questionset/mic_essay.txt", doc_path="doc/mic/", question_style = "essay", record_exp="exp_mic_auto_evaluation",topK=3,search_type="svm")
print("bert_score: ", bert_score, "\nrouge: ", rouge, "\nllm_score: ", llm_score)

1
2
3

bert_score: 0.782
rouge: 0.81
llm_score: 0.393

使用question_type測試不同方面的能力

question_type 参数提供了四種問題類型：fact、summary、irrelevant、compared，預設是 fact。

fact測試回答一般事實的能力
summary測試模型做摘要的能力
irrelevant測試模型能否分辨文件中不存在答案的問題
compared測試模型比較不同事物的能力

範例

import akasha.eval as eval

eva = eval.Model_Eval(search_type='merge', question_type = "irrelevant", model="openai:gpt-3.5-turbo", record_exp="exp_mic_auto_questionset")

eva.auto_create_questionset(doc_path="doc/mic/", question_num=10, output_file_path="questionset/mic_irre.txt")

bert_score, rouge, llm_score, tol_tokens = eva.auto_evaluation(questionset_file="questionset/mic_irre.txt", doc_path="doc/mic/", question_style = "essay", record_exp="exp_mic_auto_evaluation",search_type="svm")

指定問題集主題

如果你想生成特定主題的問題，你可以使用 create_topic_questionset 函數，它會使用輸入的主題在文檔中找到相關的文件段落並生成問題集。

範例

import akasha.eval as eval

eva = eval.Model_Eval(search_type='merge',question_type = "irrelevant", model="openai:gpt-3.5-turbo", record_exp="exp_mic_auto_questionset")

eva.create_topic_questionset(doc_path="doc/mic/", topic= "工業4.0", question_num=3, output_file_path="questionset/mic_topic_irre.txt")

bert_score, rouge, llm_score, tol_tokens = eva.auto_evaluation(questionset_file="questionset/mic_topic_irre.txt", doc_path="doc/mic/", question_style = "essay", record_exp="exp_mic_auto_evaluation",search_type="svm")

self.db的詳細資訊可參考向量資料庫

self.model_obj的詳細資訊可參考語言模型

0%