راهنمای کامل پیاده‌سازی RAG به صورت محلی: هیچ ابری یا چارچوبی مورد نیاز نیست

نویسنده(های): BeastBoyJay

در ابتدا منتشر شد به سمت هوش مصنوعی.

محتوا:

درک اصول RAG – بلوک های ساختمانی RAG محلی
ایجاد RAG محلی از ابتدا بدون استفاده از هیچ چارچوبی مانند langchain، llamaindex.

RAG چیست؟

بازیابی-افزایش نسل (RAG) یک روش پیشرفته است پردازش زبان طبیعی که با ادغام قدرت پاسخ های دقیق و مرتبط با زمینه را تولید می کند مدل های زبان بزرگ (LLMs) با یک سیستم بازیابی دانش خارجی. در طول استنتاج، RAG برخلاف مدل‌های استاندارد تولیدی که فقط از داده‌های از پیش آموزش‌دیده شده استفاده می‌کنند، داده‌ها را به صورت پویا از یک پایگاه داده یا ذخیره‌سازی اسناد متصل دریافت می‌کند. این تضمین می کند که خروجی، بر اساس داده های دنیای واقعی، جاری و منسجم باشد. RAG با استفاده از این مکانیسم ترکیبی، مخصوصاً برای برنامه‌هایی که دقت و ارتباط بسیار مهم است، مانند خدمات مشتری، خلاصه‌سازی اسناد و پاسخ به سؤال، مفید است.

اجزای RAG:

بازیابی-افزایش نسلهمانطور که از نام آن پیداست، از سه جزء اصلی تشکیل شده است: بازیابی، افزایش و تولید. با شکستن این اصطلاح، به راحتی می توانیم ساختار و هدف آن را درک کنیم.

بازیابی:

در این مرحله، زمانی که کاربر درخواستی را به خط لوله RAG ارسال می کند، ابتدا منابع مربوطه را بر اساس ورودی کاربر بازیابی می کند.

افزایش:

هنگامی که منابع مربوطه بازیابی شدند، با استفاده از یک الگوی اعلان از پیش تعریف شده، با پرس و جوی کاربر افزوده می شوند.

نسل:

پس از تقویت، ورودی برای LLM آماده است. این ورودی سپس از طریق LLM ارسال می شود که نتایج مورد نظر را ایجاد می کند

پیاده سازی:

اکنون که در مورد RAG یاد گرفتید، وقت آن رسیده است که کمی کدنویسی کنید.

مرحله 1: تنظیم محیط

pip install torch PyMuPDF tqdm transformers sentence-transformers

برای این پروژه از مشعل (برای برخی از محاسبات)، PyMuPDF (برای خواندن PDF)، tqdm (برای نوارهای پیشرفت)، ترانسفورماتور (برای LLM)، ترانسفورماتور جمله (برای مدل جاسازی) استفاده خواهیم کرد.

مرحله 2: پردازش سند/متن

مراحل داخلی:

وارد کردن سند PDF
پردازش متن برای جاسازی (به عنوان مثال تقسیم به تکه های جملات)

import fitz # PyMuPDF
from tqdm import tqdm
from spacy.lang.en import English
import reclass PDF_Processor:
def __init__(self, pdf_path):
self.pdf_path = pdf_path
 @staticmethod
def text_formatter(text: str) -> str:
# removes "\n" from the text
cleaned_text = text.replace("\n", " ").strip()
return cleaned_text
 @staticmethod
def split_list(input_list: list, slice_size: int) -> list[list[str]]:
return [
input_list[i : i + slice_size]
for i in range(0, len(input_list), slice_size)
]
def _read_PDF(self) -> list[dict]:
# Opening the PDF.
try:
pdf_document = fitz.open(self.pdf_path)
except fitz.FileDataError:
print(f"Error: Unable to open PDF file '{self.pdf_path}'.")
pages_and_texts = []
for page_number, page in tqdm(
enumerate(pdf_document), total=len(pdf_document), desc="Reading PDF"
):
# Reading all the pages line by line.
text = page.get_text()
text = self.text_formatter(text)
pages_and_texts.append(
{
"page_number": page_number,
"page_char_count": len(text),
"page_word_count": len(text.split(" ")),
"page_sentence_count_raw": len(text.split(". ")),
"page_token_count": len(text) / 4,
"text": text,
}
)
return pages_and_texts
def _split_sentence(self, pages_and_texts: list):
# Splitting each text into sentences and creating its own object.
nlp = English()
nlp.add_pipe("sentencizer")
for item in tqdm(pages_and_texts, desc="Text to sentence"):
item["sentences"] = list(nlp(item["text"]).sents)
item["sentences"] = [str(sentence) for sentence in item["sentences"]]
item["page_sentence_count_spacy"] = len(item["sentences"])
return pages_and_texts
def _chunk_sentence(self, pages_and_texts: list, chunk_size: int = 10):
# Chunking each sentence with the chunk size of 10.
for item in tqdm(pages_and_texts, desc="Sentence to chunk"):
item["sentence_chunks"] = self.split_list(item["sentences"], chunk_size)
item["page_chunk_count"] = len(item["sentence_chunks"])
return pages_and_texts
def _pages_and_chunks(self, pages_and_texts: list):
# Creating a new variable for the chunks and its metadata as its own
pages_and_chunks = []
for item in tqdm(pages_and_texts, desc="Splitting each chunk into its own"):
for sentence_chunk in item["sentence_chunks"]:
chunk_dict = {}
chunk_dict["page_number"] = item["page_number"]
# Join the sentence chunks into a single string and clean up any excess spaces.
joined_sentence_chunk = (
"".join(sentence_chunk).replace(" ", " ").strip()
)
# Fix any missing spaces after periods (e.g., "Hello.World" becomes "Hello. World").
joined_sentence_chunk = re.sub(
r"\.([A-Z])", r". \1", joined_sentence_chunk
)
chunk_dict["sentence_chunk"] = joined_sentence_chunk
chunk_dict["chunk_char_count"] = len(joined_sentence_chunk)
chunk_dict["chunk_word_count"] = len(
[word for word in joined_sentence_chunk.split(" ")]
)
chunk_dict["chunk_token_count"] = len(joined_sentence_chunk) / 4
pages_and_chunks.append(chunk_dict)
return pages_and_chunks
def _remove_irrelevant_chunks(self, pages_and_chunks: list):
# removing chunk with token count > 10, Because mostly will be small peices of text which are irrelevant.
relevant_pages_and_chunks = [
item for item in pages_and_chunks if item["chunk_token_count"] > 30
]
return relevant_pages_and_chunks
def run(self):
pages_and_texts = self._read_PDF() # Read the PDF and extract text.
self._split_sentence(pages_and_texts) # Split text into sentences.
self._chunk_sentence(pages_and_texts) # Chunk sentences into smaller sections.
pages_and_chunks = self._pages_and_chunks(
pages_and_texts
) # Create chunks with metadata.
relevant_pages_and_chunks = self._remove_irrelevant_chunks(
pages_and_chunks
) # Filter out small chunks.
return relevant_pages_and_chunks

بلوک کد بالا PDF را پردازش می کند و صفحات مربوطه را به صورت تکه ای به همراه ابرداده مرتبط برمی گرداند.

مرحله 3: ایجاد جاسازی و ذخیره آن

from sentence_transformers import SentenceTransformer
import torch
from tqdm import tqdm
import pandas as pdclass SaveEmbeddings:
def __init__(self, pdf_path, embedding_model="all-mpnet-base-v2"):
self.device = "cuda" if torch.cuda.is_available() else "cpu"
# Initialize the PDF processor to extract text chunks from the PDF.
self.pdf_processor = PDF_Processor(pdf_path=pdf_path)
# Process the PDF and extract page-wise text chunks.
self.pages_and_chunks = self.pdf_processor.run()
# Load the sentence transformer model.
self.embedding_model = SentenceTransformer(
model_name_or_path=embedding_model, device=self.device
)
def _generate_embeddings(self):
# Generating embeddings from the model.
for item in tqdm(self.pages_and_chunks, desc="Generating embeddings"):
# Generate embeddings for the sentence chunk and add it to the item.
item["embedding"] = self.embedding_model.encode(item["sentence_chunk"])
def _save_embeddings(self):
# Convert the list of dictionaries to a DataFrame for saving as CSV.
data_frame = pd.DataFrame(self.pages_and_chunks)
data_frame.to_csv("embeddings.csv", index=False)
def run(self):
self._generate_embeddings() # Generate embeddings for text chunks.
self._save_embeddings() # Save the embeddings to a CSV file.

بلوک کد بالا برای هر تکه جاسازی هایی ایجاد می کند. در این پروژه، من از مدل تعبیه شده all-mpnet-base-v2 استفاده کرده ام که یک مدل قوی با اندازه برداری 768 است. سپس همه جاسازی ها را به همراه تکه های مربوطه در یک فایل CSV ذخیره می کند.

مرحله 4: بازیابی

import numpy as np
import pandas as pd
import torch
from sentence_transformers import util, SentenceTransformerclass Semantic_search:
def __init__(self, embeddings_csv: str = "embeddings.csv"):
self.device = (
"cuda" if torch.cuda.is_available() else "cpu"
) # Set device based on availability
self.embeddings_csv = embeddings_csv # Path to embeddings CSV
self.embeddings_df = pd.read_csv(
self.embeddings_csv
) # Load embeddings into DataFrame
# Load pre-trained SentenceTransformer model
self.embedding_model = SentenceTransformer(
model_name_or_path="all-mpnet-base-v2", device=self.device
)
def _process_embeddings(self):
self.embeddings_df["embedding"] = self.embeddings_df["embedding"].apply(
lambda x: np.fromstring(
x.strip("[]"), sep=" "
) # Convert string to numpy array
)
def _get_pages_and_chunks_dict(self):
pages_and_chunks = self.embeddings_df.to_dict(orient="records")
return pages_and_chunks
def _convert_embeddings_to_tensor(self):
return torch.tensor(
np.array(self.embeddings_df["embedding"].tolist()), dtype=torch.float32
).to(self.device)
def _retrieve_relevant_resources(
self, query: str, embeddings: torch.tensor, n_resources_to_return: int = 5
):
query_embedding = self.embedding_model.encode(
query, convert_to_tensor=True
) # Encode the query
dot_scores = util.dot_score(query_embedding, embeddings)[
0
] # Calculate dot product scores
scores, indices = torch.topk(
input=dot_scores, k=n_resources_to_return
) # Get top results
return scores, indices
def _get_top_results(
self,
query: str,
embeddings: torch.tensor,
pages_and_chunks: list[dict],
n_resources_to_return: int = 5,
):
relevant_chunks = [] # List to store relevant chunks
scores, indices = self._retrieve_relevant_resources(
query=query,
embeddings=embeddings,
n_resources_to_return=n_resources_to_return,
)
# Retrieve the relevant sentence chunks based on the indices
for index in indices:
sentence_chunk = pages_and_chunks[index]["sentence_chunk"]
relevant_chunks.append(sentence_chunk)
return relevant_chunks
def run(self, query: str):
self._process_embeddings() # Process the embeddings to convert string to numpy arrays
pages_and_chunks = (
self._get_pages_and_chunks_dict()
) # Convert embeddings DataFrame to list of dictionaries
embeddings_tensor = (
self._convert_embeddings_to_tensor()
) # Convert embeddings to tensor
relevant_chunks = self._get_top_results(
query=query,
embeddings=embeddings_tensor,
pages_and_chunks=pages_and_chunks,
) # Retrieve the top relevant sentence chunks
return relevant_chunks # Return the relevant chunks

در بلوک کد بالا، ما جستجوی معنایی را در هر تکه تعبیه شده انجام می دهیم تا منابع مربوطه را بر اساس پرس و جو بازیابی کنیم. ما از همان مدل تعبیه‌ای استفاده می‌کنیم که برای تولید جاسازی‌های تکه‌ای استفاده شد. برای جستجوی معنایی، حاصل ضرب نقطه‌ای هر تکه جاسازی شده را با بردار پرس و جو محاسبه می‌کنیم تا شبیه‌ترین قطعه را شناسایی کنیم.

توجه: برای جستجوی معنایی من از محصول نقطه استفاده می‌کنم، زیرا اگر تعبیه‌های شما نرمال نباشد، باید از شباهت کسینوس استفاده کنید.

مرحله 5: افزایش

class Create_prompt:
def __init__(self):self.semantic_search = (
Semantic_search()
) # Initialize the Semantic_search instance
self.base_prompt = """Based on the following context items, please answer the query.
Give yourself room to think by extracting relevant passages from the context before answering the query.
Don't return the thinking, only return the answer.
Make sure your answers are as explanatory as possible.
Use the following examples as reference for the ideal answer style.
\nExample 1:
Query: What are the fat-soluble vitamins?
Answer: The fat-soluble vitamins include Vitamin A, Vitamin D, Vitamin E, and Vitamin K. These vitamins are absorbed along with fats in the diet and can be stored in the body's fatty tissue and liver for later use. Vitamin A is important for vision, immune function, and skin health. Vitamin D plays a critical role in calcium absorption and bone health. Vitamin E acts as an antioxidant, protecting cells from damage. Vitamin K is essential for blood clotting and bone metabolism.
\nExample 2:
Query: What are the causes of type 2 diabetes?
Answer: Type 2 diabetes is often associated with overnutrition, particularly the overconsumption of calories leading to obesity. Factors include a diet high in refined sugars and saturated fats, which can lead to insulin resistance, a condition where the body's cells do not respond effectively to insulin. Over time, the pancreas cannot produce enough insulin to manage blood sugar levels, resulting in type 2 diabetes. Additionally, excessive caloric intake without sufficient physical activity exacerbates the risk by promoting weight gain and fat accumulation, particularly around the abdomen, further contributing to insulin resistance.
\nExample 3:
Query: What is the importance of hydration for physical performance?
Answer: Hydration is crucial for physical performance because water plays key roles in maintaining blood volume, regulating body temperature, and ensuring the transport of nutrients and oxygen to cells. Adequate hydration is essential for optimal muscle function, endurance, and recovery. Dehydration can lead to decreased performance, fatigue, and increased risk of heat-related illnesses, such as heat stroke. Drinking sufficient water before, during, and after exercise helps ensure peak physical performance and recovery.
\nNow use the following context items to answer the user query:
{context}\n
User query: {query}
Answer:"""
def _get_releveant_chunks(self, query: str):
relevant_chunks = self.semantic_search.run(
query=query
) # Run semantic search to find relevant context
return relevant_chunks
def _join_chunks(self, relevant_chunks: list):
context = "- " + "\n- ".join(
item for item in relevant_chunks
) # Join chunks with list item format
return context
def run(self, query: str):
relevant_chunks = self._get_releveant_chunks(
query=query
) # Get relevant context for the query
context = self._join_chunks(relevant_chunks) # Format the context into a string
prompt = self.base_prompt.format(
context=context, query=query
) # Format the base prompt with context and query
return prompt

در بلوک کد بالا، من به سادگی متن بازیابی شده و پرس و جو را به قالب اعلان از پیش تعریف شده اضافه می کنم، که سپس به مدل LLM ارسال می شود.

مرحله 6: راه اندازی LLM

import torch
from transformers import AutoTokenizer, AutoModelForCausalLMclass LLM_Model:
def __init__(self, model_id: str = "tiiuae/Falcon3-3B-Instruct"):
# Set the device based on the availability of a GPU.
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model_id = model_id
# Load the tokenizer for the specified model.
self.tokenizer = AutoTokenizer.from_pretrained(
pretrained_model_name_or_path=model_id
)
# Load the language model with the specified configuration.
self.llm_model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=False,
).to(self.device)
# Set the pad token ID to the end-of-sequence (EOS) token if it is not already set.
if self.tokenizer.pad_token_id is None:
self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
def _get_model_inputs(self, base_prompt):
# Define a dialogue template with the user's role and content.
dialogue_template = [{"role": "user", "content": base_prompt}]
# Use the tokenizer to apply the chat template to the input prompt.
input_data = self.tokenizer.apply_chat_template(
conversation=dialogue_template, tokenize=False, add_generation_prompt=True
)
# Convert the dialogue into input tensors suitable for the model.
input_data = self.tokenizer(input_data, return_tensors="pt").to(self.device)
return input_data
def run(self, base_prompt):
# Get the model inputs from the base prompt.
input_data = self._get_model_inputs(base_prompt=base_prompt)
# Generate the text output from the model.
output_ids = self.llm_model.generate(
input_ids=input_data["input_ids"],
attention_mask=input_data["attention_mask"],
max_length=256,
do_sample=True,
pad_token_id=self.tokenizer.pad_token_id,
)
# Decode the generated output_ids to get the text response.
response = self.tokenizer.decode(output_ids[0], skip_special_tokens=True)
# Split the response to remove any extra content added by the model.
response = response.split("")[-1].strip()
return response

بلوک کد بالا یک LLM را از Hugging Face بارگیری می کند و متنی را بر اساس ورودی ارائه شده تولید می کند.

مرحله 7: تکمیل کل خط لوله

import osclass Local_RAG:
def __init__(self, pdf_path):
self.pdf_path = pdf_path # Path to the PDF file
# Check if the embeddings CSV file already exists. If not, generate and save embeddings.
if not os.path.exists("embeddings.csv"):
self.save_embeddings = SaveEmbeddings(
pdf_path=self.pdf_path
) # Initialize Save_Embeddings
self.save_embeddings.run() # Run the embedding saving process
self.create_prompt = (
Create_prompt()
) # Initialize Create_prompt for prompt generation
self.llm_model = (
LLM_Model()
) # Initialize LLM_Model for language model response generation
def run(self, query):
print("Creating Prompt....")
base_prompt = self.create_prompt.run(
query=query
) # Generate the base prompt using the query
print("Generating Results....")
response = self.llm_model.run(
base_prompt=base_prompt
) # Generate a response from the language model
return response # Return the generated response

بلوک کد تمام اجزای خط لوله RAG را ساده می کند و آن را برای استفاده آماده می کند.

مثال استفاده:

pdf_path = r"your/pdf/path"
local_rag = Local_RAG(pdf_path=pdf_path)
query = "What is the purpose of the paper?"
local_rag.run(query=query)

فقط با اجرای کد بالا می توانید از کد خود استفاده کنید Local_RAG.

بهبودها:

علیرغم اثربخشی پیاده سازی موجود، چند زمینه وجود دارد که ممکن است برای افزایش مقیاس پذیری و عملکرد بهبود یابد:

بهینه سازی مدل:

برای افزایش ارتباط و دقت پاسخ، سعی کنید از جاسازی های پیچیده تر یا ساختارهای جایگزین LLM استفاده کنید.
از ساختارهای داده مؤثرتر، مانند FAISS (جستجوی شباهت هوش مصنوعی فیسبوک)، برای ساده‌سازی فرآیند ایجاد و بازیابی تعبیه برای بازیابی سریع‌تر استفاده کنید.

مدیریت تکه:

برای افزایش انسجام پاسخ‌های تولید شده و کاهش افت متن بین تکه‌ها، از تکنیک‌های تکه‌بندی پیچیده‌تر استفاده کنید.
برای گرفتن زمینه عمیق‌تر، از جمله ابرداده‌های اضافی (مانند برچسب‌های معنایی یا سرصفحه‌های بخش سند) در فرآیند قطعه‌سازی.

به روز رسانی اسناد پویا:

با اجازه دادن به سیستم برای تغییر پویا جاسازی‌های اسناد در پاسخ به داده‌های جدید، می‌توانید سازگاری سیستم RAG را با محتوای جدید بدون نیاز به آموزش مجدد کل مدل افزایش دهید.

نتیجه گیری:

یک روش موثر و آگاه به حفظ حریم خصوصی برای بهبود سیستم‌های هوش مصنوعی از طریق پیاده‌سازی محلی بازیابی-نسل تقویت‌شده (RAG) است، به‌ویژه در تنظیماتی که عملیات آفلاین مورد نیاز است. RAG برای برنامه‌هایی مانند خدمات مشتری، خلاصه‌سازی اسناد و پاسخ‌گویی به سؤالات عالی است زیرا بازیابی، تقویت و تولید را برای ایجاد یک سیستم پویا که می‌تواند راه‌حل‌های بسیار مرتبط و دقیقی را ارائه دهد، ترکیب می‌کند. این آموزش نحوه ساخت RAG را از ابتدا بدون کمک فریمورک های شخص ثالث نشان می دهد که کنترل کامل روی رویه و ویژگی های حریم خصوصی آن را تضمین می کند.

فرآیند گام به گام شامل موارد زیر است:

نصب وابستگی ها و پیکربندی محیط.
تقسیم کردن اسناد (مانند PDF) و آماده کردن آنها برای جاسازی.
ایجاد و ذخیره جاسازی ها در یک فایل برای دسترسی بعدی.
عملی کردن جستجوی معنایی برای دریافت بخش های سند مربوطه در پاسخ به سؤالات کاربر.
افزودن زمینه مربوط به پرس و جو و تطبیق نحو آن برای مدل زبان.
برای اجرای LLM و تولید پاسخ های متنی از پرس و جوی افزوده استفاده کنید.

اجرای کامل در github را بررسی کنید:https://github.com/BEASTBOYJAY/Local_RAG

منتشر شده از طریق به سمت هوش مصنوعی

منبع: https://towardsai.net/p/l/the-complete-guide-to-implementing-rag-locally-no-cloud-or-frameworks-are-required