RobbiePasquale commited on
Commit
8e79db6
·
verified ·
1 Parent(s): e1392d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +194 -1
README.md CHANGED
@@ -3,7 +3,200 @@ license: apache-2.0
3
 
4
  ---
5
 
6
- # Model Card for World Model with MCTS and Transformer Components
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  ## Model Overview
9
 
 
3
 
4
  ---
5
 
6
+ ## Overview of the Main Menu
7
+
8
+ The `main_menu.py` script is the primary entry point for choosing and executing one of three tasks:
9
+ 1. **Training the LLM and World Model**: `train_llm_world`
10
+ 2. **Training the Search Agent**: `train_agent`
11
+ 3. **Testing the Tree of Thought Search Agent**: `test_agent`
12
+
13
+ Each task has unique functionalities and configurations. This script uses command-line arguments to specify the desired task and additional options, giving users the ability to tailor the execution according to their needs.
14
+
15
+ ### Running the Main Menu
16
+
17
+ To run the main menu, use the following command in the terminal:
18
+ ```bash
19
+ python main_menu.py --task <task_name> [additional arguments]
20
+ ```
21
+
22
+ Replace `<task_name>` with one of the following:
23
+ - `train_llm_world` - Train the LLM (Language Model) and World Model.
24
+ - `train_agent` - Train the Search Agent with an interactive Twisted-based process.
25
+ - `test_agent` - Test the Tree of Thought Search Agent, with the option of an interactive session or a single query.
26
+
27
+ ### General Arguments
28
+
29
+ The script supports a set of command-line arguments to customize each task. Here’s an overview of all possible arguments:
30
+
31
+ | Argument | Required | Description | Default |
32
+ |------------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|
33
+ | `--task` | Yes | Specifies the task to run. Choose from `train_llm_world`, `train_agent`, or `test_agent`. | None |
34
+ | `--model_name` | No | Pretrained model name for LLM. Options include `gpt2`, `bert`, etc., or a custom model path. | `gpt2` |
35
+ | `--dataset_name` | No | Name of the dataset from Hugging Face Datasets for training the LLM and World Model (e.g., `wikitext`). | `wikitext` |
36
+ | `--dataset_config` | No | Dataset configuration name for specifying different versions or configurations of the dataset. | `wikitext-2-raw-v1` |
37
+ | `--batch_size` | No | Number of samples processed in a single forward/backward pass. Increasing the batch size can speed up training but requires more memory. | `4` |
38
+ | `--num_epochs` | No | Number of times to iterate over the training dataset during model training. More epochs generally improve learning but can lead to overfitting. | `3` |
39
+ | `--max_length` | No | Maximum sequence length for training/inference. Truncates or pads sequences to this length to maintain consistency in training. | `128` |
40
+ | `--mode` | No | Specifies the mode for the LLM and World Model. Use `train` for training and `inference` for generating responses. | `train` |
41
+ | `--query` | No | Query input for `test_agent` when running a single query instead of an interactive session. | `''` (empty) |
42
+
43
+ ## Task Details
44
+
45
+ ### 1. Training the LLM and World Model (`train_llm_world`)
46
+
47
+ This task trains the LLM and the World Model using a chosen dataset from Hugging Face. Training includes adjusting model weights through epochs and creating a model capable of handling long sequences and complex reasoning tasks.
48
+
49
+ #### Example Usage
50
+ ```bash
51
+ python main_menu.py --task train_llm_world --model_name gpt2 --dataset_name wikitext --num_epochs 5 --batch_size 8 --max_length 256
52
+ ```
53
+
54
+ #### Arguments Specific to `train_llm_world`
55
+ - **`--model_name`**: Name of the pretrained model to use for language model training. You can specify a model name (like `gpt2`, `bert`, etc.) or a path to a custom model. This argument affects the model architecture and tokenization style.
56
+
57
+ - **`--dataset_name`**: Specifies the dataset from Hugging Face’s Datasets library to train the model. Options include `wikitext`, `imdb`, `squad`, etc. You can also use a custom dataset by specifying its path.
58
+
59
+ - **`--dataset_config`**: Defines the configuration of the dataset, which may be different versions or variations of the dataset. For example, `wikitext` includes configurations such as `wikitext-2-raw-v1`. The configuration will affect the format and content of the data.
60
+
61
+ - **`--batch_size`**: The number of samples per batch. A larger batch size requires more memory but can improve training speed. You might need to reduce the batch size if memory is limited.
62
+
63
+ - **`--num_epochs`**: The number of complete passes through the training dataset. More epochs can improve the model’s ability to learn but may lead to overfitting if too high.
64
+
65
+ - **`--max_length`**: Limits the maximum length of the input sequence. Truncated sequences will be cut off, and shorter sequences will be padded. This affects both training and inference.
66
+
67
+ - **`--mode`**: Defines the task to be performed. Choose `train` to start training the model. If set to `inference`, the model generates text based on the input.
68
+
69
+ ### 2. Training the Search Agent (`train_agent`)
70
+ Here's a detailed breakdown of your search agent, covering training, inference, and the functionality of each component. This overview will also highlight how the agent saves LLM training data, its modular structure, and the role of each module.
71
+
72
+ ---
73
+
74
+ ## Overview of the AutonomousWebAgent
75
+
76
+ The `AutonomousWebAgent` is a sophisticated, multi-component search and retrieval agent designed to navigate the web, gather relevant content, and perform summarization and generation based on user queries. This agent integrates reinforcement learning (RL), Monte Carlo Tree Search (MCTS), a Retrieva-Augmented Generation (RAG) Summarizer, and a Hierarchical Reinforcement Learning (HRL) architecture to select, execute, and optimize its actions based on past experiences.
77
+
78
+ ### Key Components
79
+
80
+ 1. **Prioritized Experience Replay**:
81
+ - The agent uses a `PrioritizedReplayMemory` and a `SumTree` to prioritize and store experiences (transitions between states).
82
+ - The `SumTree` structure maintains a binary tree where each parent node's value is the sum of its children, helping to efficiently store, update, and retrieve experiences based on priority.
83
+ - These experiences are critical in training both high-level (manager) and low-level (worker) components through prioritized sampling during replay, allowing the model to focus on more significant transitions.
84
+
85
+ 2. **Hierarchical Reinforcement Learning (HRL)**:
86
+ - HRL is employed to allow a **Manager** (high-level) model to select options, which are then executed by a **Worker** (low-level) model. The `ManagerModel` selects tasks (such as searching, summarizing, or generating), while the `WorkerModel` determines specific actions to take.
87
+ - The manager and worker use LSTM networks with fully connected layers, and each has its own replay memory and optimization process.
88
+ - The Manager focuses on broad decisions and options, while the Worker operates on specific actions, enabling a layered approach to decision-making.
89
+
90
+ 3. **RAGSummarizer**:
91
+ - The `RAGSummarizer` leverages a pre-trained language model (e.g., GPT-2) for summarizing, and a SentenceTransformer for embedding-based retrieval. This module breaks down the input content into chunks, retrieves relevant sections based on cosine similarity with the query, and generates a coherent summary.
92
+ - Additionally, it implements a Least Recently Used (LRU) cache to avoid redundant computation and enhance efficiency, along with persistent storage for cache data.
93
+ - Summarized results are stored, and this module contributes directly to the generation of LLM training data.
94
+
95
+ 4. **WorldModel**:
96
+ - This module encapsulates an LSTM architecture with linear layers and a `value_head` to estimate state values, allowing the agent to anticipate the long-term value of its actions.
97
+ - It is utilized in the HRL architecture, specifically by the Worker for evaluating actions and by the Manager in long-term decision-making.
98
+
99
+ 5. **Knowledge Base**:
100
+ - The knowledge base acts as a repository for collected data, maintaining embeddings for efficient search and retrieval.
101
+ - It supports saving and loading document embeddings, so the agent can retrieve relevant information for new queries from previously collected knowledge.
102
+ - Adding and retrieving from the knowledge base enriches the agent’s context and allows it to store and use information from past experiences to inform current tasks.
103
+
104
+ 6. **Monte Carlo Tree Search (MCTS)**:
105
+ - The MCTS component guides the agent through complex decision trees to determine the most promising paths for query refinement.
106
+ - Nodes in the tree represent states (possible query refinements), and child nodes represent possible expansions (e.g., related query variations).
107
+ - MCTS utilizes a `select`, `expand`, `simulate`, and `backpropagate` strategy to iteratively refine queries, scoring them based on relevance and other metrics to converge on optimal searches.
108
+ - It also integrates RL by backpropagating rewards based on the ranking score from retrieved results.
109
+
110
+ 7. **Ranking Model**:
111
+ - The ranking model, built with a neural network and the `SentenceTransformer`, ranks search results based on various features such as cosine similarity with the query, content length, keyword overlap, and domain authority.
112
+ - This model assigns scores to results, which are then used to guide the MCTS process by enhancing the combined reward with ranking scores.
113
+
114
+ 8. **Tree of Thought (ToT) Search**:
115
+ - This module enhances the agent's capability to generate a series of interconnected thoughts, exploring different perspectives or angles on a given query.
116
+ - `ToTNode` and `ToTSearch` classes enable the agent to generate thoughts, evaluate them, and navigate through them as a tree, considering various potential paths to best answer the query.
117
+ - It combines MCTS and RAG to synthesize responses based on the generated thought paths.
118
+
119
+ ### Training Process
120
+
121
+ The training process for the agent involves episodic learning, where it interacts with various queries from a predefined list. Each query initiates an episode, and the agent performs actions based on its learned policy:
122
+
123
+ 1. **Search and Summarization**:
124
+ - The agent performs search operations, gathering relevant content from online sources using the MCTS and Ranking Model for prioritization.
125
+ - Summarization is then carried out on the retrieved content, with relevant information stored in the LLM training data.
126
+
127
+ 2. **Knowledge Base and LLM Training Data Storage**:
128
+ - Throughout the training process, the agent stores retrieved documents, query results, and summaries in its knowledge base and saves training data for future LLM fine-tuning.
129
+ - The data is saved in JSONL format and includes metadata such as query terms, source links, and summaries, making it valuable for training language models.
130
+
131
+ 3. **Experience Replay**:
132
+ - Both the manager and worker models engage in prioritized experience replay, sampling from the stored experiences in the SumTree based on TD-errors.
133
+ - Replay is essential for reinforcing successful transitions and updating the models' policies over time.
134
+
135
+ 4. **Reward Calculation and Backpropagation**:
136
+ - Rewards are calculated based on ranking scores, cosine similarity with the query, and other custom factors (e.g., query complexity, state length).
137
+ - These rewards are backpropagated through the MCTS and used to update the models' decision-making processes, ensuring continuous learning and adaptation.
138
+
139
+ ### Inference Process
140
+
141
+ During inference:
142
+ - The agent accepts a query, and the Manager model selects a high-level action based on its policy (e.g., search, summarize, or generate).
143
+ - Once an option is chosen, the Worker model executes the corresponding low-level actions. For example, in a search operation, it leverages MCTS to refine the query, retrieves relevant web content, and processes it with the RAGSummarizer.
144
+ - Each inference step is augmented by the agent's existing knowledge base, enabling it to produce more informed and contextually rich responses. Additionally, if Tree of Thought (ToT) is employed, the agent synthesizes a coherent and comprehensive answer based on the thought path.
145
+
146
+ ### Model Saving
147
+
148
+ The agent incorporates a series of save functions to preserve the models:
149
+ - `save_worker_model` and `save_manager_model` functions save the worker and manager models independently.
150
+ - The `save` method preserves the overall state of the agent, which includes its knowledge base, replay memories, and models. This facilitates model reusability and persistent storage, enabling the agent to resume from saved states during training or deployment.
151
+
152
+ ---
153
+
154
+ This modular setup enhances flexibility, allowing the agent to dynamically adjust its behavior based on rewards from RL, improvements from experience replay, and efficient decision-making through MCTS. Additionally, by saving LLM training data, it becomes highly reusable for further fine-tuning, offering the opportunity to build specialized, data-driven language models optimized for specific domains or tasks.
155
+ This task uses Twisted to train the Autonomous Web Agent by interacting with various queries in a simulated or real environment. It collects rewards based on how well the agent navigates and summarizes web content or performs other tasks.
156
+
157
+ #### Example Usage
158
+ ```bash
159
+ python main_menu.py --task train_agent
160
+ ```
161
+
162
+ #### Process Details
163
+ - **Training**: During training, the agent will automatically sample a list of predefined queries, explore web pages, and use reinforcement learning to maximize its reward based on its actions. The training log provides insights into each episode's reward and the agent’s progress.
164
+
165
+ - **Logging**: Logs are recorded to `agent_training.log` and provide information about each episode, such as the query, the total reward, and the episode duration. Errors are logged, and if an episode times out, a negative reward is given.
166
+
167
+ ### 3. Testing the Tree of Thought Search Agent (`test_agent`)
168
+
169
+ This task lets you test the Tree of Thought Search Agent either in an interactive mode or by specifying a single query. In interactive mode, the user can repeatedly enter queries, and the agent will process them sequentially, producing responses based on the Tree of Thought architecture.
170
+
171
+ #### Example Usage
172
+ Interactive Mode:
173
+ ```bash
174
+ python main_menu.py --task test_agent
175
+ ```
176
+
177
+ Single Query Mode:
178
+ ```bash
179
+ python main_menu.py --task test_agent --query "What are the impacts of renewable energy on global sustainability?"
180
+ ```
181
+
182
+ #### Arguments Specific to `test_agent`
183
+ - **`--query`**: If provided, the agent will process this specific query and return a response. This is ideal for quick, one-off tests or evaluations. If not provided, the program will start an interactive session where you can repeatedly input queries and view the agent's response.
184
+
185
+ #### Interactive Mode Details
186
+ - **Input**: In interactive mode, enter a query and press Enter. The agent will respond based on its training and the Tree of Thought methodology, traversing different thought paths to generate a response.
187
+
188
+ - **Exiting**: To exit the interactive session, type `quit` and press Enter. The agent will then save any new knowledge it has gained and exit the program.
189
+
190
+ ### Additional Tips and Considerations
191
+ - **Adjusting Memory Constraints**: The batch size and model architecture (number of layers, dimensions, etc.) affect memory usage. If you encounter memory errors, reduce the batch size or sequence length.
192
+
193
+ - **Training Time**: Training the LLM and World Model may take considerable time, especially with large datasets or complex models. Use fewer epochs or a smaller dataset to speed up initial trials.
194
+
195
+ - **Model Save Path**: The `train_llm_world` task saves the model at the end of each epoch. Ensure you have enough storage space and specify a save directory if desired.
196
+
197
+ - **Logging**: Detailed logs for `train_agent` are saved to a file, which can help track progress, debug errors, and measure performance.
198
+
199
+ # World Model with MCTS and Transformer Components
200
 
201
  ## Model Overview
202