Spaces:

krishnadhulipalla
/

Personal_ChatBot

Running

App Files Files Community

krishnadhulipalla commited on 13 days ago

Commit

4d88a84

1 Parent(s): aedbc59

updated profile data

Browse files

Files changed (5) hide show

README.md +127 -11
Vector_storing.py +270 -0
all_chunks.json +0 -0
app.py +60 -75
faiss_store/v61_600_150/index.faiss +0 -0

README.md CHANGED Viewed

@@ -1,14 +1,130 @@
 ---
-title: Personal ChatBot
-emoji: 💬
-colorFrom: yellow
-colorTo: purple
-sdk: gradio
-sdk_version: 5.0.1
-app_file: app.py
-pinned: false
-license: apache-2.0
-short_description: Krishna's Persona Chat Bot using Multi RAG network
 ---
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

+# 🧠 Krishna's Personal AI Chatbot
+A memory-grounded, retrieval-augmented AI assistant built with LangChain, FAISS, BM25, and Llama3 — personalized to Krishna Vamsi Dhulipalla’s career, projects, and technical profile.
+> ⚡️ Ask me anything about Krishna — skills, experience, goals, or even what tools he used at Virginia Tech.
+---
+## 📌 Features
+- ✅ **Hybrid Retrieval**: Combines dense vector search (FAISS) + keyword search (BM25) for precise, high-recall chunk selection
+- 🤖 **LLM-Powered Pipelines**: Uses OpenAI GPT-4o and NVIDIA NIMs (e.g. LLaMA-3, Mixtral) for rewriting, validation, and final answer generation
+- 🧠 **Memory Module**: Stores user preferences, recent topics, and inferred tone using a structured `KnowledgeBase` schema
+- 🛠️ **Custom Architecture**:
+  - Query → Rewriting → Hybrid Retriever → Scope Validator → LLM Answer
+  - Fallback humor model (Mixtral) for out-of-scope queries
+- 🧩 **Document Grounding**: Powered by Krishna’s actual markdown files like `profile.md`, `goals.md`, and `chatbot_architecture.md`
+- 📊 **Enriched Vector Store**: Chunks include LLM-generated summaries and synthetic queries for better search performance
+- 🎛️ **Gradio Frontend**: Responsive, markdown-formatted interface for natural, real-time interaction
+---
+## 🏗️ Architecture
+```text
+User Query
+   ↓
+[LLM1] → Rephrase into 3 diverse subqueries
+   ↓
+Hybrid Retrieval (BM25 + FAISS)
+   ↓
+[LLM2] → Classify: In-scope or Out-of-scope
+   ↓
+   ├─ In-scope → Top-k Chunks → GPT-4o
+   └─ Out-of-scope → Mixtral (funny fallback)
+   ↓
+Final Answer + Async Memory Update
+```
 ---
+## 📂 Project Structure
+```
+.
+├── app.py                      # Main Gradio app and pipeline logic
+├── Vector_storing.py          # Chunking, LLM-based enrichment, and FAISS store creation
+├── requirements.txt           # Python package dependencies
+├── faiss_store/               # Saved FAISS vector index
+├── all_chunks.json            # JSON of enriched document chunks
+├── personal_data/             # Source markdown files (right now excluded)
+├── README.md
+```
 ---
+## 🧠 Knowledge Sources
+All answers are grounded in curated markdown files:
+| File Name                 | Description                                    |
+| ------------------------- | ---------------------------------------------- |
+| `profile.md`              | Krishna’s full technical profile and education |
+| `goals.md`                | Short- and long-term personal goals            |
+| `chatbot_architecture.md` | System-level breakdown of this AI assistant    |
+| `personal_interests.md`   | Hobbies, cultural identity, food preferences   |
+| `conversations.md`        | Sample queries and expected response tone      |
+---
+## 🧪 How It Works
+1. **User input** is rewritten into subqueries (LLM1)
+2. **Retriever** fetches relevant chunks using BM25 and FAISS
+3. **Classifier LLM** decides if results are relevant to Krishna
+4. **GPT-4o** generates final answer using top-k chunks
+5. **Memory is updated** asynchronously with every turn
+---
+## 💬 Example Queries
+- What programming languages does Krishna know?
+- Tell me about Krishna’s chatbot architecture
+- Can this chatbot explain Krishna's work at Virginia Tech?
+- What tools has Krishna used for data engineering?
+---
+## 🚀 Setup & Usage
+```bash
+# 1. Clone the repo
+git clone https://github.com/krishna-creator/krishna-personal-chatbot.git
+cd krishna-personal-chatbot
+# 2. Install dependencies
+pip install -r requirements.txt
+# 3. Set your API keys (OpenAI, NVIDIA)
+export OPENAI_API_KEY=...
+export NVIDIA_API_KEY=...
+# 4. Launch the chatbot
+python app.py
+```
+---
+## 🔮 Model Stack
+| Purpose            | Model Name               | Provider |
+| ------------------ | ------------------------ | -------- |
+| Query Rewriting    | `phi-3-mini-4k-instruct` | NVIDIA   |
+| Scope Classifier   | `llama-3-70b-instruct`   | NVIDIA   |
+| Answer Generator   | `gpt-4o`                 | OpenAI   |
+| Fallback Humor LLM | `mixtral-8x22b-instruct` | NVIDIA   |
+---
+## 📌 Acknowledgments
+- Built as part of Krishna's exploration into **LLM orchestration and agentic RAG**
+- Inspired by LangChain, SentenceTransformers, and NVIDIA RAG Agents Course
+---
+## 📜 License
+MIT License © Krishna Vamsi Dhulipalla

Vector_storing.py ADDED Viewed

	@@ -0,0 +1,270 @@

+import os
+import re
+import json
+import hashlib
+from pathlib import Path
+from dotenv import load_dotenv
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.vectorstores import FAISS
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_nvidia_ai_endpoints import ChatNVIDIA
+# === UTILS ===
+def hash_text(text):
+    return hashlib.md5(text.encode()).hexdigest()[:8]
+def fix_json_text(text):
+    # Normalize quotes and extract clean JSON
+    text = text.replace("“", '"').replace("”", '"').replace("‘", "'").replace("’", "'")
+    match = re.search(r'\{.*\}', text, re.DOTALL)
+    return match.group(0) if match else text
+def enrich_chunk_with_llm(text, llm):
+    prompt = f"""You're a helpful assistant optimizing document retrieval.
+            Every document you see is about Krishna Vamsi Dhulipalla.
+            Here’s a document chunk:
+            {text}
+            1. Summarize the key content of this chunk in 1–2 sentences, assuming the overall context is about Krishna.
+            2. Generate 3 natural-language questions that a user might ask to which this chunk would be a relevant answer, focusing on Krishna-related topics.
+            Respond in JSON:
+            {{
+            "summary": "...",
+            "synthetic_queries": ["...", "...", "..."]
+            }}"""
+    response = llm.invoke(prompt)
+    content = getattr(response, "content", "").strip()
+    if not content:
+        raise ValueError("⚠️ LLM returned empty response")
+    fixed = fix_json_text(content)
+    try:
+        return json.loads(fixed)
+    except Exception as e:
+        raise ValueError(f"Invalid JSON from LLM: {e}\n--- Raw Output ---\n{content}")
+# === MAIN FUNCTION ===
+def create_faiss_store(
+    md_dir="./personal_data",
+    chunk_size=600,
+    chunk_overlap=150,
+    persist_dir="./faiss_store",
+    chunk_save_path="all_chunks.json",
+    llm=None
+):
+    splitter = RecursiveCharacterTextSplitter(
+        chunk_size=chunk_size,
+        chunk_overlap=chunk_overlap,
+        separators=["\n# ", "\n## ", "\n### ", "\n#### ", "\n\n", "\n- ", "\n", ". ", " "],
+        keep_separator=True,
+        length_function=len,  # Consider switching to tokenizer-based later
+        is_separator_regex=False
+    )
+    docs, all_chunks, failed_chunks = [], [], []
+    for md_file in Path(md_dir).glob("*.md"):
+        with open(md_file, "r", encoding="utf-8") as f:
+            content = f.read().strip()
+            if not content:
+                continue
+            content = re.sub(r'\n#+(\w)', r'\n# \1', content)
+            docs.append({
+                "content": content,
+                "metadata": {
+                    "source": md_file.name,
+                    "header": content.split('\n')[0]
+                }
+            })
+    for doc in docs:
+        try:
+            chunks = splitter.split_text(doc["content"])
+        except Exception as e:
+            print(f"❌ Error splitting {doc['metadata']['source']}: {e}")
+            continue
+        for i, chunk in enumerate(chunks):
+            chunk = chunk.strip()
+            if len(chunk) < 50:
+                continue
+            chunk_id = f"{doc['metadata']['source']}_#{i}_{hash_text(chunk)}"
+            metadata = {
+                **doc["metadata"],
+                "chunk_id": chunk_id,
+                "has_header": chunk.startswith("#"),
+                "word_count": len(chunk.split())
+            }
+            try:
+                print("🔍 Processing chunk:", chunk_id)
+                enriched = enrich_chunk_with_llm(chunk, llm)
+                summary = enriched.get("summary", "")
+                questions = enriched.get("synthetic_queries", [])
+                metadata.update({
+                    "summary": summary,
+                    "synthetic_queries": questions
+                })
+                enriched_text = (
+                    f"{chunk}\n\n"
+                    f"---\n"
+                    f"🔹 Summary:\n{summary}\n\n"
+                    f"🔸 Related Questions:\n" + "\n".join(f"- {q}" for q in questions)
+                )
+                all_chunks.append({
+                    "text": enriched_text,
+                    "metadata": metadata
+                })
+            except Exception as e:
+                print(f"⚠️ LLM failed for {chunk_id}: {e}")
+                failed_chunks.append(f"{chunk_id} → {str(e)}")
+    print(f"✅ Markdown files processed: {len(docs)}")
+    print(f"✅ Chunks created: {len(all_chunks)} | ⚠️ Failed: {len(failed_chunks)}")
+    # Save enriched chunks
+    with open(chunk_save_path, "w", encoding="utf-8") as f:
+        json.dump(all_chunks, f, indent=2, ensure_ascii=False)
+    print(f"📁 Saved enriched chunks → {chunk_save_path}")
+    os.makedirs(persist_dir, exist_ok=True)
+    version_tag = f"v{len(all_chunks)}_{chunk_size}_{chunk_overlap}"
+    save_path = os.path.join(persist_dir, version_tag)
+    os.makedirs(save_path, exist_ok=True)
+    embeddings = HuggingFaceEmbeddings(
+        model_name="sentence-transformers/all-MiniLM-L6-v2",
+        model_kwargs={"device": "cpu"},
+        encode_kwargs={"normalize_embeddings": True}
+    )
+    vector_store = FAISS.from_texts(
+        texts=[chunk["text"] for chunk in all_chunks],
+        embedding=embeddings,
+        metadatas=[chunk["metadata"] for chunk in all_chunks]
+    )
+    vector_store.save_local(save_path)
+    print(f"✅ FAISS index saved at: {save_path}")
+    avg_len = sum(len(c['text']) for c in all_chunks) / len(all_chunks) if all_chunks else 0
+    print(f"📊 Stats → Chunks: {len(all_chunks)} | Avg length: {avg_len:.1f} characters")
+    if failed_chunks:
+        with open("failed_chunks.txt", "w") as f:
+            for line in failed_chunks:
+                f.write(line + "\n")
+        print("📝 Failed chunk IDs saved to failed_chunks.txt")
+dotenv_path = os.path.join(os.getcwd(), ".env")
+load_dotenv(dotenv_path)
+api_key = os.getenv("NVIDIA_API_KEY")
+os.environ["NVIDIA_API_KEY"] = api_key
+# Initialize the model
+llm = ChatNVIDIA(model="nvidia/llama-3.1-nemotron-70b-instruct")
+create_faiss_store(
+    md_dir="./personal_data",
+    chunk_size=600,
+    chunk_overlap=150,
+    persist_dir="./faiss_store",
+    llm=llm
+)
+#
+# from langchain.text_splitter import (
+#     RecursiveCharacterTextSplitter,
+#     MarkdownHeaderTextSplitter
+# )
+# from langchain.embeddings import HuggingFaceEmbeddings
+# from langchain.vectorstores import FAISS
+# from langchain.docstore.document import Document
+# from transformers import AutoTokenizer
+# from pathlib import Path
+# import os
+# from typing import List
+# def prepare_vectorstore(
+#     base_path: str,
+#     faiss_path: str,
+#     use_markdown_headers: bool = True,
+#     chunk_size: int = 600,
+#     chunk_overlap: int = 150,
+#     model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
+#     verbose: bool = True
+# ) -> FAISS:
+#     docs = []
+#     for md_file in Path(base_path).glob("*.md"):
+#         with open(md_file, "r", encoding="utf-8") as f:
+#             content = f.read()
+#         metadata = {
+#             "source": md_file.name,
+#             "file_type": "markdown",
+#             "created_at": md_file.stat().st_ctime
+#         }
+#         docs.append(Document(page_content=content, metadata=metadata))
+#     # Optional Markdown-aware splitting
+#     if use_markdown_headers:
+#         header_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=[
+#             ("#", "h1"), ("##", "h2"), ("###", "h3")
+#         ])
+#         structured_chunks = []
+#         for doc in docs:
+#             splits = header_splitter.split_text(doc.page_content)
+#             for chunk in splits:
+#                 chunk.metadata.update(doc.metadata)
+#                 structured_chunks.append(chunk)
+#     else:
+#         structured_chunks = docs
+#     # Tokenizer-based recursive splitting
+#     tokenizer = AutoTokenizer.from_pretrained(model_name)
+#     recursive_splitter = RecursiveCharacterTextSplitter(
+#         chunk_size=chunk_size,
+#         chunk_overlap=chunk_overlap,
+#         length_function=lambda text: len(tokenizer.encode(text)),
+#         separators=["\n## ", "\n### ", "\n\n", "\n", ". "]
+#     )
+#     final_chunks: List[Document] = []
+#     for chunk in structured_chunks:
+#         sub_chunks = recursive_splitter.split_text(chunk.page_content)
+#         for i, sub in enumerate(sub_chunks):
+#             final_chunks.append(Document(
+#                 page_content=sub,
+#                 metadata={**chunk.metadata, "sub_chunk": i}
+#             ))
+#     if verbose:
+#         print(f"✅ Total chunks after splitting: {len(final_chunks)}")
+#         print(f"📁 Storing to: {faiss_path}")
+#     embedding_model = HuggingFaceEmbeddings(model_name=model_name)
+#     vectorstore = FAISS.from_documents(final_chunks, embedding_model)
+#     vectorstore.save_local(faiss_path)
+#     if verbose:
+#         print(f"✅ FAISS vectorstore saved at: {os.path.abspath(faiss_path)}")
+#     return vectorstore
+# vectorstore = prepare_vectorstore(
+#     base_path="./personal_data",
+#     faiss_path="krishna_vectorstore_hybrid",
+#     use_markdown_headers=True,
+#     chunk_size=600,
+#     chunk_overlap=150,
+#     verbose=True
+# )

all_chunks.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

app.py CHANGED Viewed

@@ -3,9 +3,8 @@ import json
 import re
 import hashlib
 import gradio as gr
-import threading
 from functools import partial
-import concurrent.futures
 from collections import defaultdict
 from pathlib import Path
 from typing import List, Dict, Any, Optional, List, Literal, Type
@@ -19,13 +18,10 @@ from langchain_nvidia_ai_endpoints import ChatNVIDIA
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.prompts import ChatPromptTemplate
 from langchain.schema.runnable.passthrough import RunnableAssign
-from langchain.text_splitter import RecursiveCharacterTextSplitter
 from langchain_huggingface import HuggingFaceEmbeddings
 from langchain.vectorstores import FAISS
-from langchain.docstore.document import Document
 from langchain.retrievers import BM25Retriever
 from langchain_openai import ChatOpenAI
-from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
 from langchain.output_parsers import PydanticOutputParser
 #dotenv_path = os.path.join(os.getcwd(), ".env")
@@ -38,7 +34,7 @@ if not api_key:
     raise RuntimeError("🚨 NVIDIA_API_KEY not found in environment! Please add it in Hugging Face Secrets.")
 # Constants
-FAISS_PATH = "faiss_store/v30_600_150"
 CHUNKS_PATH = "all_chunks.json"
 if not Path(FAISS_PATH).exists():
@@ -47,13 +43,16 @@ if not Path(FAISS_PATH).exists():
 if not Path(CHUNKS_PATH).exists():
     raise FileNotFoundError(f"Chunks file not found at {CHUNKS_PATH}")
-KRISHNA_BIO = """Krishna Vamsi Dhulipalla completed M.Eng program in Computer Science at Virginia Tech, awarded degree in december 2024, with over 3 years of experience across data engineering, machine learning research, and real-time analytics. He specializes in building scalable data systems and intelligent LLM-powered applications, with strong expertise in Python, PyTorch, Hugging Face Transformers, and end-to-end ML pipelines.
 He has led projects involving retrieval-augmented generation (RAG), feature selection for genomic classification, fine-tuning domain-specific LLMs (e.g., DNABERT, HyenaDNA), and real-time forecasting systems using Kafka, Spark, and Airflow. His cloud proficiency spans AWS (S3, SageMaker, ECS, CloudWatch), GCP (BigQuery, Cloud Composer), and DevOps tools like Docker, Kubernetes, and MLflow.
 Krishna’s research has focused on genomic sequence modeling, transformer optimization, MLOps automation, and cross-domain generalization. He has published work in bioinformatics and machine learning applications for circadian transcription prediction and transcription factor binding.
-He holds certifications in NVIDIA’s RAG Agents with LLMs, Google Cloud Data Engineering, and AWS ML Specialization. Krishna is passionate about scalable LLM infrastructure, data-centric AI, and domain-adaptive ML solutions — combining deep technical expertise with real-world engineering impact."""
 def initialize_console():
     console = Console()
@@ -62,6 +61,12 @@ def initialize_console():
 pprint = initialize_console()
 def load_chunks_from_json(path: str = CHUNKS_PATH) -> List[Dict]:
     with open(path, "r", encoding="utf-8") as f:
         return json.load(f)
@@ -111,12 +116,16 @@ answer_llm = ChatOpenAI(
 # Prompts
 repharser_prompt = ChatPromptTemplate.from_template(
-    "Rewrite the question below in 3 different ways to help retrieve related information. Vary tone, style, and phrasing, but keep the meaning the same."
-    "Question: {query}"
-    "\n\nRewrites:"
-    "1."
     "2."
-    "3."
 )
 relevance_prompt = ChatPromptTemplate.from_template("""
@@ -201,19 +210,11 @@ answer_prompt_relevant = ChatPromptTemplate.from_template(
     "- You may use general knowledge to briefly explain tools (like PyTorch or Kafka), but **do not invent any new facts** about Krishna.\n"
     "- Avoid filler phrases, repetition, or generic praise (e.g., strengths) unless directly asked.\n"
     "- End with a friendly follow-up question (no subheading needed here).\n\n"
-    "Example:\n"
-    "**Q: What work experience does Krishna have?**\n"
-    "**A:**\n"
-    "**🔧 Work Experience Overview**\n"
-    "**1. UJR Technologies** – Migrated batch ETL to real-time (Kafka/Spark), Dockerized services, and optimized Snowflake queries.\n"
-    "**2. Virginia Tech** – Built real-time IoT forecasting pipeline (10K sensors, GPT-4), achieving 91% accuracy and 15% energy savings.\n\n"
-    "_Would you like to dive into Krishna’s cloud deployment work using SageMaker and MLflow?_\n\n"
     "Now generate the answer for the following:\n\n"
     "User Question:\n{query}\n\n"
     "Answer:"
 )
 answer_prompt_fallback = ChatPromptTemplate.from_template(
     "You are Krishna’s personal AI assistant. The user asked a question unrelated to Krishna’s background.\n"
     "Respond with a touch of humor, then guide the conversation back to Krishna’s actual skills, experiences, or projects.\n\n"
@@ -239,17 +240,15 @@ parser_prompt = ChatPromptTemplate.from_template(
 # Helper Functions
 def parse_rewrites(raw_response: str) -> list[str]:
     lines = raw_response.strip().split("\n")
-    return [line.strip("0123456789. ").strip() for line in lines if line.strip()][:3]
 def hybrid_retrieve(inputs, exclude_terms=None):
-    # if exclude_terms is None:
-    #     exclude_terms = ["cgpa", "university", "b.tech", "m.s.", "certification", "coursera", "edx", "goal", "aspiration", "linkedin", "publication", "ieee", "doi", "degree"]
     bm25_retriever = inputs["bm25_retriever"]
     all_queries = inputs["all_queries"]
     bm25_retriever.k = inputs["k_per_query"]
     vectorstore = inputs["vectorstore"]
     alpha = inputs["alpha"]
-    top_k = inputs.get("top_k", 15)
     k_per_query = inputs["k_per_query"]
     scored_chunks = defaultdict(lambda: {
@@ -258,45 +257,37 @@ def hybrid_retrieve(inputs, exclude_terms=None):
         "content": None,
         "metadata": None,
     })
-    # Function to process each subquery
-    def process_subquery(subquery, k_per_query=3):
-        # Vector retrieval
-        vec_hits = vectorstore.similarity_search_with_score(subquery, k=k_per_query)
-        vec_results = []
-        for doc, score in vec_hits:
-            key = hashlib.md5(doc.page_content.encode("utf-8")).hexdigest()
-            vec_results.append((key, doc, score))
-        # BM25 retrieval
         bm_hits = bm25_retriever.invoke(subquery)
-        bm_results = []
-        for rank, doc in enumerate(bm_hits):
-            key = hashlib.md5(doc.page_content.encode("utf-8")).hexdigest()
-            bm_score = 1.0 - (rank / k_per_query)
-            bm_results.append((key, doc, bm_score))
         return vec_results, bm_results
-     # Process subqueries in parallel
-    with concurrent.futures.ThreadPoolExecutor() as executor:
-        futures = [executor.submit(process_subquery, q) for q in all_queries]
-        for future in concurrent.futures.as_completed(futures):
-            vec_results, bm_results = future.result()
-            # Process vector results
-            for key, doc, score in vec_results:
-                scored_chunks[key]["vector_scores"].append(score)
-                scored_chunks[key]["content"] = doc.page_content
-                scored_chunks[key]["metadata"] = doc.metadata
-            # Process BM25 results
-            for key, doc, bm_score in bm_results:
-                scored_chunks[key]["bm25_score"] += bm_score
-                scored_chunks[key]["content"] = doc.page_content
-                scored_chunks[key]["metadata"] = doc.metadata
-    # Rest of the scoring and filtering logic remains the same
     all_vec_means = [np.mean(v["vector_scores"]) for v in scored_chunks.values() if v["vector_scores"]]
     max_vec = max(all_vec_means) if all_vec_means else 1
     min_vec = min(all_vec_means) if all_vec_means else 0
@@ -304,23 +295,18 @@ def hybrid_retrieve(inputs, exclude_terms=None):
     final_results = []
     for chunk in scored_chunks.values():
         vec_score = np.mean(chunk["vector_scores"]) if chunk["vector_scores"] else 0.0
-        norm_vec = (vec_score - min_vec) / (max_vec - min_vec) if max_vec != min_vec else 1.0
         bm25_score = chunk["bm25_score"] / len(all_queries)
         final_score = alpha * norm_vec + (1 - alpha) * bm25_score
         content = chunk["content"].lower()
-        if final_score < 0.05 or len(content.strip()) < 100:
             continue
         final_results.append({
             "content": chunk["content"],
             "source": chunk["metadata"].get("source", ""),
-            "final_score": float(round(final_score, 4)),
-            "vector_score": float(round(vec_score, 4)),
-            "bm25_score": float(round(bm25_score, 4)),
-            "metadata": chunk["metadata"],
-            "summary": chunk["metadata"].get("summary", ""),
-            "synthetic_queries": chunk["metadata"].get("synthetic_queries", [])
         })
     final_results = sorted(final_results, key=lambda x: x["final_score"], reverse=True)
@@ -477,8 +463,8 @@ def chat_interface(message, history):
         "query": message,
         "all_queries": [message],
         "all_texts": all_chunks,
-        "k_per_query": 3,
-        "alpha": 0.7,
         "vectorstore": vectorstore,
         "bm25_retriever": bm25_retriever,
     }
@@ -497,7 +483,6 @@ def chat_interface(message, history):
     # After streaming completes, update KB in background thread
     if full_response:
-        import threading
         update_thread = threading.Thread(
             target=update_knowledge_base,
             args=(message, full_response),
@@ -549,9 +534,9 @@ demo = gr.ChatInterface(
     title="💬 Ask Krishna's AI Assistant",
     description="💡 Ask anything about Krishna Vamsi Dhulipalla",
     examples=[
-        "What are Krishna's research interests?",
-        "What are Krishna's skills?",
-        "What did he study at Virginia Tech?"
     ],
 )

 import re
 import hashlib
 import gradio as gr
 from functools import partial
+import threading
 from collections import defaultdict
 from pathlib import Path
 from typing import List, Dict, Any, Optional, List, Literal, Type
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.prompts import ChatPromptTemplate
 from langchain.schema.runnable.passthrough import RunnableAssign
 from langchain_huggingface import HuggingFaceEmbeddings
 from langchain.vectorstores import FAISS
 from langchain.retrievers import BM25Retriever
 from langchain_openai import ChatOpenAI
 from langchain.output_parsers import PydanticOutputParser
 #dotenv_path = os.path.join(os.getcwd(), ".env")
     raise RuntimeError("🚨 NVIDIA_API_KEY not found in environment! Please add it in Hugging Face Secrets.")
 # Constants
+FAISS_PATH = "faiss_store/v61_600_150"
 CHUNKS_PATH = "all_chunks.json"
 if not Path(FAISS_PATH).exists():
 if not Path(CHUNKS_PATH).exists():
     raise FileNotFoundError(f"Chunks file not found at {CHUNKS_PATH}")
+KRISHNA_BIO = """Krishna Vamsi Dhulipalla completed masters in Computer Science at Virginia Tech, awarded degree in december 2024, with over 3 years of experience across data engineering, machine learning research, and real-time analytics. He specializes in building scalable data systems and intelligent LLM-powered applications, with strong expertise in Python, PyTorch, Hugging Face Transformers, and end-to-end ML pipelines.
 He has led projects involving retrieval-augmented generation (RAG), feature selection for genomic classification, fine-tuning domain-specific LLMs (e.g., DNABERT, HyenaDNA), and real-time forecasting systems using Kafka, Spark, and Airflow. His cloud proficiency spans AWS (S3, SageMaker, ECS, CloudWatch), GCP (BigQuery, Cloud Composer), and DevOps tools like Docker, Kubernetes, and MLflow.
 Krishna’s research has focused on genomic sequence modeling, transformer optimization, MLOps automation, and cross-domain generalization. He has published work in bioinformatics and machine learning applications for circadian transcription prediction and transcription factor binding.
+He holds certifications in NVIDIA’s RAG Agents with LLMs, Google Cloud Data Engineering, and AWS ML Specialization. Krishna is passionate about scalable LLM infrastructure, data-centric AI, and domain-adaptive ML solutions — combining deep technical expertise with real-world engineering impact.
+\n\n
+Beside carrer, Krishna loves hiking, cricket, and exploring new technologies. He is big fan of Marvel Movies and Space exploration.
+"""
 def initialize_console():
     console = Console()
 pprint = initialize_console()
+def PPrint(preface="State: "):
+    def print_and_return(x, preface=""):
+        pprint(preface, x)
+        return x
+    return RunnableLambda(partial(print_and_return, preface=preface))
 def load_chunks_from_json(path: str = CHUNKS_PATH) -> List[Dict]:
     with open(path, "r", encoding="utf-8") as f:
         return json.load(f)
 # Prompts
 repharser_prompt = ChatPromptTemplate.from_template(
+    "You are a smart retrieval assistant. Rewrite the user's question into 2 different variants optimized for hybrid retrieval systems (BM25 + dense vectors).\n\n"
+    "Your rewrites should:\n"
+    "- Vary tone and phrasing\n"
+    "- Expand or clarify intent if implicit\n"
+    "- Include helpful keywords, synonyms, or topic-specific terms if possible\n"
+    "- Be semantically close but diverse enough to match different chunks in the knowledge base\n\n"
+    "Original Question:\n{query}\n\n"
+    "Rewrites:\n"
+    "1.\n"
     "2."
 )
 relevance_prompt = ChatPromptTemplate.from_template("""
     "- You may use general knowledge to briefly explain tools (like PyTorch or Kafka), but **do not invent any new facts** about Krishna.\n"
     "- Avoid filler phrases, repetition, or generic praise (e.g., strengths) unless directly asked.\n"
     "- End with a friendly follow-up question (no subheading needed here).\n\n"
     "Now generate the answer for the following:\n\n"
     "User Question:\n{query}\n\n"
     "Answer:"
 )
 answer_prompt_fallback = ChatPromptTemplate.from_template(
     "You are Krishna’s personal AI assistant. The user asked a question unrelated to Krishna’s background.\n"
     "Respond with a touch of humor, then guide the conversation back to Krishna’s actual skills, experiences, or projects.\n\n"
 # Helper Functions
 def parse_rewrites(raw_response: str) -> list[str]:
     lines = raw_response.strip().split("\n")
+    return [line.strip("0123456789. ").strip() for line in lines if line.strip()][:2]
 def hybrid_retrieve(inputs, exclude_terms=None):
     bm25_retriever = inputs["bm25_retriever"]
     all_queries = inputs["all_queries"]
     bm25_retriever.k = inputs["k_per_query"]
     vectorstore = inputs["vectorstore"]
     alpha = inputs["alpha"]
+    top_k = inputs.get("top_k", 30)
     k_per_query = inputs["k_per_query"]
     scored_chunks = defaultdict(lambda: {
         "content": None,
         "metadata": None,
     })
+    def process_subquery(subquery, k=k_per_query):
+        vec_hits = vectorstore.similarity_search_with_score(subquery, k=k)
         bm_hits = bm25_retriever.invoke(subquery)
+        vec_results = [
+            (hashlib.md5(doc.page_content.encode("utf-8")).hexdigest(), doc, score)
+            for doc, score in vec_hits
+        ]
+        bm_results = [
+            (hashlib.md5(doc.page_content.encode("utf-8")).hexdigest(), doc, 1.0 / (rank + 1))
+            for rank, doc in enumerate(bm_hits)
+        ]
         return vec_results, bm_results
+    # Process each subquery serially
+    for subquery in all_queries:
+        vec_results, bm_results = process_subquery(subquery)
+        for key, doc, vec_score in vec_results:
+            scored_chunks[key]["vector_scores"].append(vec_score)
+            scored_chunks[key]["content"] = doc.page_content
+            scored_chunks[key]["metadata"] = doc.metadata
+        for key, doc, bm_score in bm_results:
+            scored_chunks[key]["bm25_score"] += bm_score
+            scored_chunks[key]["content"] = doc.page_content
+            scored_chunks[key]["metadata"] = doc.metadata
     all_vec_means = [np.mean(v["vector_scores"]) for v in scored_chunks.values() if v["vector_scores"]]
     max_vec = max(all_vec_means) if all_vec_means else 1
     min_vec = min(all_vec_means) if all_vec_means else 0
     final_results = []
     for chunk in scored_chunks.values():
         vec_score = np.mean(chunk["vector_scores"]) if chunk["vector_scores"] else 0.0
+        norm_vec = 0.5 if max_vec == min_vec else (vec_score - min_vec) / (max_vec - min_vec)
         bm25_score = chunk["bm25_score"] / len(all_queries)
         final_score = alpha * norm_vec + (1 - alpha) * bm25_score
         content = chunk["content"].lower()
+        if final_score < 0.01 or len(content.strip()) < 40:
             continue
         final_results.append({
             "content": chunk["content"],
             "source": chunk["metadata"].get("source", ""),
+            "final_score": float(round(final_score, 4))
         })
     final_results = sorted(final_results, key=lambda x: x["final_score"], reverse=True)
         "query": message,
         "all_queries": [message],
         "all_texts": all_chunks,
+        "k_per_query": 10,
+        "alpha": 0.5,
         "vectorstore": vectorstore,
         "bm25_retriever": bm25_retriever,
     }
     # After streaming completes, update KB in background thread
     if full_response:
         update_thread = threading.Thread(
             target=update_knowledge_base,
             args=(message, full_response),
     title="💬 Ask Krishna's AI Assistant",
     description="💡 Ask anything about Krishna Vamsi Dhulipalla",
     examples=[
+        "Give me an overview of Krishna Vamsi Dhulipalla’s work experience across different roles?",
+        "What programming languages and tools does Krishna use for data science?",
+        "Can this chatbot tell me what Krishna's chatbot architecture looks like and how it works?"
     ],
 )

faiss_store/v61_600_150/index.faiss ADDED Viewed

Binary file (93.7 kB). View file