summary

發表於 2024-12-27 更新於 2024-11-13 分類於摘要

文件摘要

若要創建文本文件的摘要（.pdf、.txt.、docx），您可以使用 summary.summarize_file 函數。如範例，以下使用 map_reduce 摘要方法指示語言模型生成大約 500 字的摘要。有兩種摘要類型，map_reduce 和 refine，map_reduce 將對每個文本段落進行摘要，然後使用所有摘要的文本段落生成最終摘要；refine 將逐個摘要每個文本段落，並使用前一個摘要作為摘要下一段的提示，以獲得更高水平的摘要一致性。

範例

import akasha
sum = akasha.Summary( chunk_size=1000, chunk_overlap=100)
sum.summarize_file(file_path="doc/mic/5軸工具機因應市場訴求改變的發展態勢.pdf",summary_type="map_reduce", summary_len=500\
, chunk_overlap=40)


### Arguments of Summary class ###
 Args:
            **chunk_size (int, optional)**: chunk size of texts from documents. Defaults to 1000.
            **chunk_overlap (int, optional)**: chunk overlap of texts from documents. Defaults to 40.
            **model (str, optional)**: llm model to use. Defaults to "gpt-3.5-turbo".
            **verbose (bool, optional)**: show log texts or not. Defaults to False.
            **language (str, optional)**: the language of documents and prompt, use to make sure docs won't exceed
                max token size of llm input.
            **record_exp (str, optional)**: use aiido to save running params and metrics to the remote mlflow or not if record_exp not empty, and setrecord_exp as experiment name.  default "".
            **system_prompt (str, optional)**: the system prompt that you assign special instruction to llm model, so will not be used
                in searching relevant documents. Defaults to "".
            **max_input_tokens(int, optional)**: max token length of llm input. Defaults to 3000.
            **temperature (float, optional)**: temperature of llm model from 0.0 to 1.0 . Defaults to 0.0.
            **auto_translate (bool, optional)**: translate summary into language or not.
                        **max_output_tokens (int, optional)**: max output tokens of llm model. Defaults to 1024.
            **max_input_tokens (int, optional)**: max input tokens of llm model. Defaults to 3000.   
            **env_file (str, optional)**: the path of env file. Defaults to "".