語言模型

發表於 2024-12-26 更新於 2025-01-09 分類於進階

選擇不同語言模型

使用參數model便可以選擇不同的語言模型，預設是openai:gpt-3.5-turbo.

範例

1. openai

import akasha

akasha.Doc_QA()
ak.get_response(dir_path, 
                prompt, 
                embeddings="openai:text-embedding-ada-002",
                model="openai:gpt-3.5-turbo")

2. huggingface

import akasha

ak = akasha.Doc_QA()
ak.get_response(dir_path, 
                prompt, 
                embeddings="huggingface:all-MiniLM-L6-v2",
                model="hf:meta-llama/Llama-2-13b-chat-hf")

3. llama-cpp

安裝llama-cpp-python可以使用cpu推論.gguf格式的模型，或是安裝akasha時選擇llama-cpp

1 2	pip install llama-cpp-python #pip install akasha-terminal[llama-cpp]

llama-cpp允許使用quantized模型並執行在cpu上，你可以從huggingface上下載.gguf llama-cpp 模型，如範例，如果你的模型下載到”model/“路徑下，可以使用以下方法加載模型

import akasha

ak = akasha.Doc_QA()
ak.get_response(dir_path, 
                prompt, 
                embeddings="huggingface:all-MiniLM-L6-v2",
                model="llama-cpu:model/llama-2-13b-chat.Q5_K_S.gguf")

llama-cpp同樣允許使用gpu運算模型，但安裝套件時需要使用cmake安裝，並確認已安裝g++, gcc和nvidia driver & toolkit，詳細請見llama-cpp-python

1	CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 python -m pip install --upgrade --force-reinstall llama-cpp-python>=0.3.1 --no-cache-dir

4. 遠端api

如果你使用別人的api或者透過支援openAI api框架的部署自己的模型(例如vllm, TGI, litellm…)，你可以使用 remote:{your LLM api url} 來加載模型，若須指定模型名稱，使用 remote:{your LLM api url}@{your model name} 。

若遠端api需要api金鑰，請先完成設定環境變數 REMOTE_API_KEY ，參考設定 API Key

import akasha

ak = akasha.Doc_QA()
ak.get_response(dir_path, 
                prompt, 
                model="remote:http://140.92.60.189:8081@llama-3.2-11B")

5. gemini

(請先完成設定 API Key)

import akasha

ak = akasha.Doc_QA()
ak.get_response(dir_path, 
                prompt, 
                embeddings="gemini:models/text-embedding-004",
                model="gemini:gemini-1.5-flash")

6. anthropic

(請先完成設定 API Key)

import akasha

ak = akasha.Doc_QA()
ak.get_response(dir_path, 
                prompt, 
                model="anthropic:claude-3-5-sonnet-20241022")

可使用的模型

openai_model = "openai:gpt-3.5-turbo"  # need environment variable "OPENAI_API_KEY"
gemini_model="gemini:gemini-1.5-flash" # need environment variable "GEMINI_API_KEY"
anthropic_model = "anthropic:claude-3-5-sonnet-20241022" # need environment variable "ANTHROPIC_API_KEY"
huggingface_model = "hf:meta-llama/Llama-2-7b-chat-hf" #need environment variable "HUGGINGFACEHUB_API_TOKEN" to download meta-llama model
quantized_ch_llama_model = "hf:FlagAlpha/Llama2-Chinese-13b-Chat-4bit"
taiwan_llama_gptq = "hf:weiren119/Taiwan-LLaMa-v1.0-4bits-GPTQ"
mistral = "hf:Mistral-7B-Instruct-v0.2" 
mediatek_Breeze = "hf:MediaTek-Research/Breeze-7B-Instruct-64k-v0.1"
### If you want to use llama-cpp to run model on cpu, you can download gguf version of models 
### from https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF  and the name behind "llama-gpu:" or "llama-cpu:"
### from https://huggingface.co/TheBloke/CodeUp-Llama-2-13B-Chat-HF-GGUF
### is the path of the downloaded .gguf file
llama_cpp_model = "llama-gpu:model/llama-2-13b-chat-hf.Q5_K_S.gguf"  
llama_cpp_model = "llama-cpu:model/llama-2-7b-chat.Q5_K_S.gguf"
llama_cpp_chinese_alpaca = "llama-gpu:model/chinese-alpaca-2-7b.Q5_K_S.gguf"
llama_cpp_chinese_alpaca = "llama-cpu:model/chinese-alpaca-2-13b.Q5_K_M.gguf"
chatglm_model = "chatglm:THUDM/chatglm2-6b"

自訂語言模型

如果你想使用其他模型，可以建立一個輸入是prompt的函數並回傳語言模型的回答，並將此函數作為model參數

example

我們建立一個test_model函數，並可以將它作為參數輸入進get_response回答問題

import akasha

def test_model(prompt:str):
    
    import openai
    from langchain.chat_models import ChatOpenAI
    openai.api_type = "open_ai"
    model = ChatOpenAI(model="gpt-3.5-turbo", temperature = 0)
    ret = model.predict(prompt)
    
    return ret

doc_path = "./mic/"
prompt = "五軸是什麼?"

qa = akasha.Doc_QA(verbose=True, search_type = "svm", model = test_model)
qa.get_response(doc_path= doc_path, prompt = prompt)

建立LLM物件

以上使用model參數選擇模型後，便會在Doc_QA物件內建立模型的物件model_obj(LLM)

import akasha

AK = akasha.Doc_QA(model="openai:gpt-3.5-turbo")

print(type(AK.model_obj))

也可以使用輔助函數建立LLM物件

import akasha

model_obj = akasha.handle_model("openai:gpt-3.5-turbo",verbose=False,temperature=0.0)

print(type(model_obj))

此LLM物件也可直接傳入Doc_QA，避免重複宣告

import akasha

model_obj = akasha.handle_model("openai:gpt-3.5-turbo",verbose=False,temperature=0.0)

AK = Doc_QA(model=model_obj)

直接使用LLM物件

取得模型類別

使用_llm_type()可取得語言模型的類別

import akasha

model_obj = akasha.handle_model("gemini:gemini-1.5-flash",verbose=False,temperature=0.0)

print(model_obj._llm_type()) ## "gemini:gemini-1.5-flash"

模型推論

若要使用語言模型進行推論，可以使用函式call_model

import akasha
system_prompt = "用中文回答"
prompt = "五軸是什麼?"
model_obj = akasha.handle_model("openai:gpt-e3.5-turbo", False, 0.0)
input_text = akasha.prompts.format_sys_prompt(system_prompt, prompt, "gpt")

response = akasha.call_model(model_obj, input_text)

流輸出

若要呼叫語言模型即時回答，可以使用函式call_stream_model

import akasha

system_prompt = "用中文回答"
prompt = "五軸是什麼?"
model_obj = akasha.handle_model("openai:gpt-e3.5-turbo", False, 0.0)
input_text = akasha.prompts.format_sys_prompt(system_prompt, prompt, "gpt")

streaming = akasha.call_stream_model(model_obj, input_text)

for s in streaming:
    print(s)

批量推論

如果你有大量不需要連貫的推理需求，可以使用akasha.helper.call_batch_model 來進行批量推理來提升速度。

1 2	def call_batch_model(model: LLM, prompt: List[str], system_prompt: Union[List[str], str] = "") -> List[str]:

import akasha

model_obj = akasha.helper.handle_model("openai:gpt-3.5-turbo", False, 0.0)
# this prompt ask LLM to response 'yes' or 'no' if the document segment is relevant to the user question or not.
SYSTEM_PROMPT = akasha.prompts.default_doc_grader_prompt() 
documents = ["Doc1...", "Doc2...", "Doc3...", "Doc4..."]
question = "五軸是什麼?"

prompts = ["document: " + doc +"\n\n" + "User Question: "+ question for doc in documents]

response_list = call_batch_model(model_obj, prompt, SYSTEM_PROMPT)

## ["yes", "no", "yes", "yes"]