summary

文件摘要

若要創建文本文件的摘要(.pdf、.txt.、docx),您可以使用 summary.summarize_file 函數。如範例,以下使用 map_reduce 摘要方法指示語言模型生成大約 500 字的摘要。有兩種摘要類型,map_reducerefinemap_reduce 將對每個文本段落進行摘要,然後使用所有摘要的文本段落生成最終摘要;refine 將逐個摘要每個文本段落,並使用前一個摘要作為摘要下一段的提示,以獲得更高水平的摘要一致性。

範例

1
2
3
4
import akasha
sum = akasha.Summary( chunk_size=1000, chunk_overlap=100)
sum.summarize_file(file_path="doc/mic/5軸工具機因應市場訴求改變的發展態勢.pdf",summary_type="map_reduce", summary_len=500\
, chunk_overlap=40)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

### Arguments of Summary class ###
Args:
**chunk_size (int, optional)**: chunk size of texts from documents. Defaults to 1000.
**chunk_overlap (int, optional)**: chunk overlap of texts from documents. Defaults to 40.
**model (str, optional)**: llm model to use. Defaults to "gpt-3.5-turbo".
**verbose (bool, optional)**: show log texts or not. Defaults to False.
**language (str, optional)**: the language of documents and prompt, use to make sure docs won't exceed
max token size of llm input.
**record_exp (str, optional)**: use aiido to save running params and metrics to the remote mlflow or not if record_exp not empty, and setrecord_exp as experiment name. default "".
**system_prompt (str, optional)**: the system prompt that you assign special instruction to llm model, so will not be used
in searching relevant documents. Defaults to "".
**max_input_tokens(int, optional)**: max token length of llm input. Defaults to 3000.
**temperature (float, optional)**: temperature of llm model from 0.0 to 1.0 . Defaults to 0.0.
**auto_translate (bool, optional)**: translate summary into language or not.
**max_output_tokens (int, optional)**: max output tokens of llm model. Defaults to 1024.
**max_input_tokens (int, optional)**: max input tokens of llm model. Defaults to 3000.
**env_file (str, optional)**: the path of env file. Defaults to "".