Hopsakee commited on
Commit
5fe3652
·
verified ·
1 Parent(s): 5b40ec9

Upload folder using huggingface_hub

Browse files
docs/qdrant_lessons_learned.md ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qdrant Integration: Lessons Learned
2
+
3
+ ## Introduction
4
+
5
+ This document summarizes our experience integrating Qdrant vector database with FastEmbed for embedding generation. We encountered several challenges related to vector naming conventions, search query formats, and other aspects of working with Qdrant. This document outlines the issues we faced and the solutions we implemented to create a robust vector search system.
6
+
7
+ ## Problem Statement
8
+
9
+ We were experiencing issues with vector name mismatches in our Qdrant integration. Specifically:
10
+
11
+ 1. Points were being skipped during processing with the error message "Skipping point as it has no valid vector"
12
+ 2. The vector names we specified in our configuration did not match the actual vector names used in the Qdrant collection
13
+ 3. We had implemented unnecessary sanitization of model names
14
+
15
+ ## Understanding Vector Names in Qdrant
16
+
17
+ ### How Qdrant Handles Vector Names
18
+
19
+ According to the [Qdrant documentation](https://qdrant.tech/documentation/concepts/collections/), when creating a collection with vectors, you specify vector names and their configurations. These names are used as keys when inserting and querying vectors.
20
+
21
+ However, when using FastEmbed with Qdrant, we discovered that the model names specified in the configuration are transformed before being used as vector names in the collection:
22
+
23
+ - Original model name: `"intfloat/multilingual-e5-large"`
24
+ - Actual vector name in Qdrant: `"fast-multilingual-e5-large"`
25
+
26
+ Similarly for sparse vectors:
27
+ - Original model name: `"prithivida/Splade_PP_en_v1"`
28
+ - Actual vector name in Qdrant: `"fast-sparse-splade_pp_en_v1"`
29
+
30
+ ### Initial Approach (Problematic)
31
+
32
+ Our initial approach was to manually transform the model names using a `format_vector_name` function:
33
+
34
+ ```python
35
+ def format_vector_name(name: str) -> str:
36
+ """Format a model name into a valid vector name for Qdrant."""
37
+ return name.replace('/', '_')
38
+ ```
39
+
40
+ This led to inconsistencies because:
41
+ 1. We were using one transformation in our code (`replace('/', '_')`)
42
+ 2. FastEmbed was using a different transformation (prefixing with "fast-" and removing slashes)
43
+
44
+ ## Solution: Dynamic Vector Name Discovery
45
+
46
+ Instead of trying to predict how FastEmbed transforms model names, we implemented a solution that dynamically discovers the actual vector names from the Qdrant collection configuration.
47
+
48
+ ### Helper Functions
49
+
50
+ We added two helper functions to retrieve the actual vector names:
51
+
52
+ ```python
53
+ def get_dense_vector_name(client: QdrantClient, collection_name: str) -> str:
54
+ """
55
+ Get the name of the dense vector from the collection configuration.
56
+
57
+ Args:
58
+ client: Initialized Qdrant client
59
+ collection_name: Name of the collection
60
+
61
+ Returns:
62
+ Name of the dense vector as used in the collection
63
+ """
64
+ try:
65
+ return list(client.get_collection(collection_name).config.params.vectors.keys())[0]
66
+ except (IndexError, AttributeError) as e:
67
+ logger.warning(f"Could not get dense vector name: {e}")
68
+ # Fallback to a default name
69
+ return "fast-multilingual-e5-large"
70
+
71
+ def get_sparse_vector_name(client: QdrantClient, collection_name: str) -> str:
72
+ """
73
+ Get the name of the sparse vector from the collection configuration.
74
+
75
+ Args:
76
+ client: Initialized Qdrant client
77
+ collection_name: Name of the collection
78
+
79
+ Returns:
80
+ Name of the sparse vector as used in the collection
81
+ """
82
+ try:
83
+ return list(client.get_collection(collection_name).config.params.sparse_vectors.keys())[0]
84
+ except (IndexError, AttributeError) as e:
85
+ logger.warning(f"Could not get sparse vector name: {e}")
86
+ # Fallback to a default name
87
+ return "fast-sparse-splade_pp_en_v1"
88
+ ```
89
+
90
+ ### Implementation in Vector Creation
91
+
92
+ When creating new points or updating existing ones, we now use these helper functions to get the correct vector names:
93
+
94
+ ```python
95
+ # Get vector names from the collection configuration
96
+ dense_vector_name = get_dense_vector_name(client, collection_name)
97
+ sparse_vector_name = get_sparse_vector_name(client, collection_name)
98
+
99
+ # Create point with the correct vector names
100
+ point = PointStruct(
101
+ id=str(uuid.uuid4()),
102
+ vector={
103
+ dense_vector_name: get_embedding(payload_new['purpose'])[0],
104
+ sparse_vector_name: get_embedding(payload_new['purpose'])[1]
105
+ },
106
+ payload={
107
+ # payload fields...
108
+ }
109
+ )
110
+ ```
111
+
112
+ ### Implementation in Vector Querying
113
+
114
+ Similarly, when querying vectors, we use the same helper functions:
115
+
116
+ ```python
117
+ # Get the actual vector names from the collection configuration
118
+ dense_vector_name = get_dense_vector_name(client, collection_name)
119
+
120
+ # Skip points without vector or without the required vector type
121
+ if not point.vector or dense_vector_name not in point.vector:
122
+ logger.debug(f"Skipping point {point_id} as it has no valid vector")
123
+ continue
124
+
125
+ # Find semantically similar points using Qdrant's search
126
+ similar_points = client.search(
127
+ collection_name=collection_name,
128
+ query_vector={
129
+ dense_vector_name: point.vector.get(dense_vector_name)
130
+ },
131
+ limit=100,
132
+ score_threshold=SIMILARITY_THRESHOLD
133
+ )
134
+ ```
135
+
136
+ ## Key Insights
137
+
138
+ 1. **Model Names vs. Vector Names**: There's a distinction between the model names you specify in your configuration and the actual vector names used in the Qdrant collection. FastEmbed transforms these names.
139
+
140
+ 2. **Dynamic Discovery**: Instead of hardcoding vector names or trying to predict the transformation, it's better to dynamically discover the actual vector names from the collection configuration.
141
+
142
+ 3. **Fallback Mechanism**: Always include fallback mechanisms in case the collection information can't be retrieved, making your code more robust.
143
+
144
+ 4. **Consistency**: Use the same vector names throughout your system to ensure consistency between vector creation, storage, and retrieval.
145
+
146
+ 5. **Correct Search Query Format**: When using named vectors in Qdrant search queries, you must use the correct format. Instead of passing a dictionary with vector names as keys, use the `query_vector` parameter for the actual vector and the `using` parameter to specify which named vector to use.
147
+
148
+ ## Accessing Collection Configuration
149
+
150
+ The key to our solution was discovering how to access the collection configuration to get the actual vector names:
151
+
152
+ ```python
153
+ # Get dense vector name
154
+ dense_vector_name = list(client.get_collection(collection_name).config.params.vectors.keys())[0]
155
+
156
+ # Get sparse vector name
157
+ sparse_vector_name = list(client.get_collection(collection_name).config.params.sparse_vectors.keys())[0]
158
+ ```
159
+
160
+ This approach allows our code to adapt to however FastEmbed decides to name the vectors in the collection, rather than assuming a specific naming convention.
161
+
162
+ ## Correct Search Query Format for Named Vectors
163
+
164
+ When using named vectors in Qdrant, it's important to use the correct format for search queries. The format depends on the version of the Qdrant client you're using:
165
+
166
+ ### Incorrect Format (Causes Validation Error)
167
+
168
+ ```python
169
+ # This format causes a validation error
170
+ similar_points = client.search(
171
+ collection_name=collection_name,
172
+ query_vector={
173
+ dense_vector_name: point.vector.get(dense_vector_name)
174
+ },
175
+ limit=100
176
+ )
177
+ ```
178
+
179
+ ### Correct Format for Qdrant Client Version 1.12.2
180
+
181
+ ```python
182
+ # This is the correct format for Qdrant client version 1.12.2
183
+ similar_points = client.search(
184
+ collection_name=collection_name,
185
+ query_vector=(dense_vector_name, point.vector.get(dense_vector_name)), # Tuple of (vector_name, vector_values)
186
+ limit=100,
187
+ score_threshold=0.8 # Optional similarity threshold
188
+ )
189
+ ```
190
+
191
+ In Qdrant client version 1.12.2, the correct way to specify which named vector to use is by providing a tuple to the `query_vector` parameter. The tuple should contain the vector name as the first element and the actual vector values as the second element.
192
+
193
+ Using the incorrect format will result in a Pydantic validation error with messages like:
194
+
195
+ ```
196
+ validation errors for SearchRequest
197
+ vector.list[float]
198
+ Input should be a valid list [type=list_type, input_value={'fast-multilingual-e5-la...}, input_type=dict]
199
+ vector.NamedVector.name
200
+ Field required [type=missing, input_value={'fast-multilingual-e5-la...}, input_type=dict]
201
+ ```
202
+
203
+ ## Optimizing Search Parameters for Deduplication
204
+
205
+ When using Qdrant for deduplication of similar content, the search parameters play a crucial role in determining the effectiveness of the process. We've found the following parameters to be particularly important:
206
+
207
+ ### Similarity Threshold
208
+
209
+ The `score_threshold` parameter determines the minimum similarity score required for points to be considered similar:
210
+
211
+ ```python
212
+ similar_points = client.search(
213
+ collection_name=collection_name,
214
+ query_vector=(dense_vector_name, point.vector.get(dense_vector_name)),
215
+ limit=100,
216
+ score_threshold=0.9 # Only consider points with similarity > 90%
217
+ )
218
+ ```
219
+
220
+ For deduplication purposes, we found that a higher threshold (0.9) works better than a lower one (0.7) to avoid false positives. This means that only very similar items will be considered duplicates.
221
+
222
+ ### Text Difference Threshold
223
+
224
+ In addition to vector similarity, we also check the actual text difference between potential duplicates:
225
+
226
+ ```python
227
+ # Constants for duplicate detection
228
+ SIMILARITY_THRESHOLD = 0.9 # Minimum semantic similarity to consider as potential duplicate
229
+ DIFFERENCE_THRESHOLD = 0.05 # Maximum text difference (5%) to consider as duplicate
230
+ ```
231
+
232
+ The `DIFFERENCE_THRESHOLD` of 0.05 means that texts with less than 5% difference will be considered duplicates. This two-step verification (vector similarity + text difference) helps to ensure that only true duplicates are removed.
233
+
234
+ ## Logging Considerations
235
+
236
+ When working with Qdrant, especially during development and debugging, it's helpful to adjust the logging level:
237
+
238
+ ```python
239
+ # Set log level and prevent propagation
240
+ logger.setLevel(logging.DEBUG) # For development/debugging
241
+ logger.setLevel(logging.INFO) # For production
242
+ ```
243
+
244
+ Using `DEBUG` level during development provides detailed information about vector operations, including:
245
+ - Which points are being processed
246
+ - Why points are being skipped (e.g., missing vectors)
247
+ - Similarity scores between points
248
+ - Deduplication decisions
249
+
250
+ However, in production, it's better to use `INFO` level to reduce log volume, especially when processing large collections.
251
+
252
+ ## Performance Considerations
253
+
254
+ ### Batch Operations
255
+
256
+ When working with large numbers of points, it's more efficient to use batch operations:
257
+
258
+ ```python
259
+ # Batch upsert example
260
+ client.upsert(
261
+ collection_name=collection_name,
262
+ points=batch_of_points # List of PointStruct objects
263
+ )
264
+ ```
265
+
266
+ This reduces network overhead compared to upserting points individually.
267
+
268
+ ### Search Limit
269
+
270
+ The `limit` parameter in search operations should be set carefully:
271
+
272
+ ```python
273
+ similar_points = client.search(
274
+ collection_name=collection_name,
275
+ query_vector=(dense_vector_name, point.vector.get(dense_vector_name)),
276
+ limit=100, # Maximum number of similar points to return
277
+ score_threshold=0.9
278
+ )
279
+ ```
280
+
281
+ A higher limit increases the chance of finding all duplicates but also increases search time. For deduplication purposes, we found that a limit of 100 provides a good balance between thoroughness and performance.
282
+
283
+ ## Conclusion
284
+
285
+ Our experience with Qdrant has taught us several important lessons:
286
+
287
+ 1. **Dynamic Vector Name Discovery**: By retrieving the actual vector names from the Qdrant collection configuration, we've created a robust solution that adapts to the naming conventions used by FastEmbed and Qdrant.
288
+
289
+ 2. **Correct Query Format**: Using the proper format for search queries with named vectors is essential - specifically using a tuple of (vector_name, vector_values) for the query_vector parameter.
290
+
291
+ 3. **Optimized Search Parameters**: Fine-tuning similarity thresholds and text difference thresholds is crucial for effective deduplication, with higher thresholds (0.9 for similarity, 0.05 for text difference) providing better results.
292
+
293
+ 4. **Appropriate Logging Levels**: Using DEBUG level during development and INFO in production helps balance between having enough information for troubleshooting and maintaining performance.
294
+
295
+ 5. **Batch Operations**: Using batch operations for inserting and updating points significantly improves performance when working with large collections.
296
+
297
+ By implementing these lessons, we've created a more efficient and reliable vector search system that properly handles named vectors, effectively identifies duplicates, and maintains good performance even with large collections.
298
+
299
+ This solution should work regardless of changes to the naming conventions in future versions of Qdrant or FastEmbed, as it reads the actual names directly from the collection configuration.
logs/fabric_to_espanso.log.1 CHANGED
The diff for this file is too large to render. See raw diff
 
logs/fabric_to_espanso.log.2 ADDED
The diff for this file is too large to render. See raw diff
 
logs/fabric_to_espanso.log.3 ADDED
The diff for this file is too large to render. See raw diff
 
logs/fabric_to_espanso.log.4 ADDED
The diff for this file is too large to render. See raw diff
 
main.py CHANGED
@@ -8,9 +8,10 @@ from contextlib import contextmanager
8
  from src.fabrics_processor.database import initialize_qdrant_database
9
  from src.fabrics_processor.file_change_detector import detect_file_changes
10
  from src.fabrics_processor.database_updater import update_qdrant_database
11
- from src.fabrics_processor.yaml_file_generator import generate_yaml_file
12
  from src.fabrics_processor.logger import setup_logger
13
  from src.fabrics_processor.config import config
 
14
  from src.fabrics_processor.exceptions import (
15
  DatabaseConnectionError,
16
  DatabaseInitializationError
@@ -62,13 +63,27 @@ def process_changes(client) -> bool:
62
  if deleted_files:
63
  logger.info(f"Deleted files: {deleted_files}")
64
 
 
 
 
65
  # Update database if there are changes
66
  if any([new_files, modified_files, deleted_files]):
67
  logger.info("Changes detected. Updating database...")
68
- update_qdrant_database(client, new_files, modified_files, deleted_files)
69
 
 
 
 
 
 
 
70
  # Always generate output files to ensure consistency
71
  generate_yaml_file(client, config.yaml_output_folder)
 
 
 
 
 
72
 
73
  return True
74
 
 
8
  from src.fabrics_processor.database import initialize_qdrant_database
9
  from src.fabrics_processor.file_change_detector import detect_file_changes
10
  from src.fabrics_processor.database_updater import update_qdrant_database
11
+ from src.fabrics_processor.output_files_generator import generate_yaml_file
12
  from src.fabrics_processor.logger import setup_logger
13
  from src.fabrics_processor.config import config
14
+ from src.fabrics_processor.deduplicator import remove_duplicates
15
  from src.fabrics_processor.exceptions import (
16
  DatabaseConnectionError,
17
  DatabaseInitializationError
 
63
  if deleted_files:
64
  logger.info(f"Deleted files: {deleted_files}")
65
 
66
+ # Track changes for summary
67
+ duplicates_removed = 0
68
+
69
  # Update database if there are changes
70
  if any([new_files, modified_files, deleted_files]):
71
  logger.info("Changes detected. Updating database...")
72
+ update_qdrant_database(client, config.embedding.collection_name, new_files, modified_files, deleted_files)
73
 
74
+ # Deduplicate entries after updating the database
75
+ logger.info("Checking for and removing duplicate entries...")
76
+ duplicates_removed = remove_duplicates(client, config.embedding.collection_name)
77
+ if duplicates_removed > 0:
78
+ logger.info(f"Removed {duplicates_removed} duplicate entries from the database")
79
+
80
  # Always generate output files to ensure consistency
81
  generate_yaml_file(client, config.yaml_output_folder)
82
+
83
+ # Generate summary message
84
+ total_entries = len(client.scroll(collection_name=config.embedding.collection_name, limit=10000)[0])
85
+ summary_message = f"Database update summary: {len(new_files)} added, {len(modified_files)} modified, {len(deleted_files)} deleted, {duplicates_removed} duplicates removed. Total entries: {total_entries}"
86
+ logger.info(summary_message)
87
 
88
  return True
89
 
parameters.py CHANGED
@@ -61,5 +61,5 @@ REQUIRED_FIELDS_DEFAULTS = {
61
 
62
  # Embedding Model parameters voor Qdrant
63
  USE_FASTEMBED = True
64
- EMBED_MODEL_DENSE = 'BAAI/bge-base-en' # "fast-bge-small-en"
65
  EMBED_MODEL_SPARSE = "prithivida/Splade_PP_en_v1"
 
61
 
62
  # Embedding Model parameters voor Qdrant
63
  USE_FASTEMBED = True
64
+ EMBED_MODEL_DENSE = "intfloat/multilingual-e5-large" # 'BAAI/bge-base-en' # "fast-bge-small-en"
65
  EMBED_MODEL_SPARSE = "prithivida/Splade_PP_en_v1"
src/fabrics_processor/database.py CHANGED
@@ -12,6 +12,42 @@ from .exceptions import DatabaseConnectionError, CollectionError, DatabaseInitia
12
 
13
  logger = logging.getLogger('fabric_to_espanso')
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  def create_database_connection(url: Optional[str] = None, api_key: Optional[str] = None) -> QdrantClient:
16
  """Create a database connection.
17
 
 
12
 
13
  logger = logging.getLogger('fabric_to_espanso')
14
 
15
+ def get_dense_vector_name(client: QdrantClient, collection_name: str) -> str:
16
+ """
17
+ Get the name of the dense vector from the collection configuration.
18
+
19
+ Args:
20
+ client: Initialized Qdrant client
21
+ collection_name: Name of the collection
22
+
23
+ Returns:
24
+ Name of the dense vector as used in the collection
25
+ """
26
+ try:
27
+ return list(client.get_collection(collection_name).config.params.vectors.keys())[0]
28
+ except (IndexError, AttributeError) as e:
29
+ logger.warning(f"Could not get dense vector name: {e}")
30
+ # Fallback to a default name
31
+ return "fast-multilingual-e5-large"
32
+
33
+ def get_sparse_vector_name(client: QdrantClient, collection_name: str) -> str:
34
+ """
35
+ Get the name of the sparse vector from the collection configuration.
36
+
37
+ Args:
38
+ client: Initialized Qdrant client
39
+ collection_name: Name of the collection
40
+
41
+ Returns:
42
+ Name of the sparse vector as used in the collection
43
+ """
44
+ try:
45
+ return list(client.get_collection(collection_name).config.params.sparse_vectors.keys())[0]
46
+ except (IndexError, AttributeError) as e:
47
+ logger.warning(f"Could not get sparse vector name: {e}")
48
+ # Fallback to a default name
49
+ return "fast-sparse-splade_pp_en_v1"
50
+
51
  def create_database_connection(url: Optional[str] = None, api_key: Optional[str] = None) -> QdrantClient:
52
  """Create a database connection.
53
 
src/fabrics_processor/database_updater.py CHANGED
@@ -7,10 +7,12 @@ import uuid
7
  from .output_files_generator import generate_yaml_file, generate_markdown_files
8
  from .config import config
9
  from .exceptions import ConfigurationError
10
- from .database import validate_point_payload
11
 
12
  logger = logging.getLogger('fabric_to_espanso')
13
 
 
 
14
  def get_embedding(text: str) -> list:
15
  """
16
  Generate embedding vector for the given text using FastEmbed.
@@ -59,11 +61,16 @@ def update_qdrant_database(client: QdrantClient, collection_name: str, new_files
59
  for file in new_files:
60
  try:
61
  payload_new = validate_point_payload(file)
 
 
 
 
 
62
  point = PointStruct(
63
  id=str(uuid.uuid4()), # Generate a new UUID for each point
64
  vector={
65
- 'fast-bge-base-en': get_embedding(payload_new['purpose'])[0],
66
- 'fast-sparse-splade_pp_en_v1': get_embedding(payload_new['purpose'])[1]
67
  },
68
  payload={
69
  "filename": payload_new['filename'],
@@ -95,11 +102,16 @@ def update_qdrant_database(client: QdrantClient, collection_name: str, new_files
95
  point_id = scroll_result[0].id
96
  payload_current = validate_point_payload(file, point_id)
97
  # Update the existing point with the new file data
 
 
 
 
 
98
  point = PointStruct(
99
  id=point_id,
100
  vector={
101
- 'fast-bge-base-en': get_embedding(payload_current['purpose'])[0],
102
- 'fast-sparse-splade_pp_en_v1': get_embedding(payload_current['purpose'])[1]
103
  },
104
  payload={
105
  "filename": payload_current['filename'],
 
7
  from .output_files_generator import generate_yaml_file, generate_markdown_files
8
  from .config import config
9
  from .exceptions import ConfigurationError
10
+ from .database import validate_point_payload, get_dense_vector_name, get_sparse_vector_name
11
 
12
  logger = logging.getLogger('fabric_to_espanso')
13
 
14
+ # TODO: Make a summary of the prompts using a call to an LLM for every prompt and store that in the purpose field
15
+ # of the database instead of the extracted purpose from the markdown files and use that summary to create the embeddings
16
  def get_embedding(text: str) -> list:
17
  """
18
  Generate embedding vector for the given text using FastEmbed.
 
61
  for file in new_files:
62
  try:
63
  payload_new = validate_point_payload(file)
64
+ # Get vector names from the collection configuration
65
+ dense_vector_name = get_dense_vector_name(client, collection_name)
66
+ sparse_vector_name = get_sparse_vector_name(client, collection_name)
67
+
68
+ # Create point with the correct vector names
69
  point = PointStruct(
70
  id=str(uuid.uuid4()), # Generate a new UUID for each point
71
  vector={
72
+ dense_vector_name: get_embedding(payload_new['purpose'])[0],
73
+ sparse_vector_name: get_embedding(payload_new['purpose'])[1]
74
  },
75
  payload={
76
  "filename": payload_new['filename'],
 
102
  point_id = scroll_result[0].id
103
  payload_current = validate_point_payload(file, point_id)
104
  # Update the existing point with the new file data
105
+ # Get vector names from the collection configuration
106
+ dense_vector_name = get_dense_vector_name(client, collection_name)
107
+ sparse_vector_name = get_sparse_vector_name(client, collection_name)
108
+
109
+ # Create point with the correct vector names
110
  point = PointStruct(
111
  id=point_id,
112
  vector={
113
+ dense_vector_name: get_embedding(payload_current['purpose'])[0],
114
+ sparse_vector_name: get_embedding(payload_current['purpose'])[1]
115
  },
116
  payload={
117
  "filename": payload_current['filename'],
src/fabrics_processor/deduplicator.py ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Deduplication module for fabric-to-espanso."""
2
+ import logging
3
+ from typing import List, Dict, Any, Tuple, Set
4
+ import difflib
5
+ from qdrant_client import QdrantClient
6
+ from qdrant_client.http.models import Filter, PointIdsList
7
+
8
+ from .config import config
9
+ from .database import get_dense_vector_name, get_sparse_vector_name
10
+
11
+ logger = logging.getLogger('fabric_to_espanso')
12
+
13
+ def calculate_text_difference_percentage(text1: str, text2: str) -> float:
14
+ """
15
+ Calculate the percentage difference between two text strings.
16
+
17
+ Args:
18
+ text1: First text string
19
+ text2: Second text string
20
+
21
+ Returns:
22
+ Percentage difference as a float between 0.0 (identical) and 1.0 (completely different)
23
+ """
24
+ # Use difflib's SequenceMatcher to calculate similarity ratio
25
+ similarity = difflib.SequenceMatcher(None, text1, text2).ratio()
26
+ # Convert similarity to difference percentage
27
+ difference_percentage = 1.0 - similarity
28
+ return difference_percentage
29
+
30
+ # TODO: Consider moving the vector similarity search functionality to database_query.py and import it here
31
+ # This would create a more structured codebase with search functionality centralized in one place
32
+ def find_duplicates(client: QdrantClient, collection_name: str = config.embedding.collection_name) -> List[Tuple[str, List[str]]]:
33
+ """
34
+ Find duplicate entries in the database based on semantic similarity and text difference.
35
+
36
+ Args:
37
+ client: Initialized Qdrant client
38
+ collection_name: Name of the collection to query
39
+
40
+ Returns:
41
+ List of tuples containing (kept_point_id, [duplicate_point_ids])
42
+ """
43
+ # Constants for duplicate detection
44
+ SIMILARITY_THRESHOLD = 0.85 # Minimum semantic similarity to consider as potential duplicate
45
+ DIFFERENCE_THRESHOLD = 0.1 # Maximum text difference (10%) to consider as duplicate
46
+ # Get all points from the database
47
+ all_points = client.scroll(
48
+ collection_name=collection_name,
49
+ with_vectors=True, # Include vector data, else no vector will be available
50
+ limit=10000 # Adjust based on expected file count
51
+ )[0]
52
+
53
+ logger.info(f"Checking {len(all_points)} entries for duplicates")
54
+
55
+ # Track processed points to avoid redundant comparisons
56
+ processed_points = set()
57
+ # Store duplicates as (kept_id, [duplicate_ids])
58
+ duplicates = []
59
+
60
+ # For each point, find semantically similar points
61
+ for i, point in enumerate(all_points):
62
+ if point.id in processed_points:
63
+ continue
64
+
65
+ point_id = point.id
66
+ point_content = point.payload.get('content', '')
67
+ logger.debug(f"Checking point {point_id} for duplicates")
68
+ logger.debug(f"Content: {point_content}")
69
+
70
+ # Skip if no content
71
+ if not point_content:
72
+ logger.debug(f"Skipping point {point_id} as it has no content")
73
+ continue
74
+
75
+ # Get the actual vector names from the collection configuration
76
+ dense_vector_name = get_dense_vector_name(client, collection_name)
77
+
78
+ # Skip points without vector or without the required vector type
79
+ if not point.vector or dense_vector_name not in point.vector:
80
+ logger.debug(f"Skipping point {point_id} as it has no valid vector")
81
+ continue
82
+
83
+ # Find semantically similar points using Qdrant's search
84
+ similar_points = client.search(
85
+ collection_name=collection_name,
86
+ query_vector=(dense_vector_name, point.vector.get(dense_vector_name)),
87
+ limit=100,
88
+ score_threshold=SIMILARITY_THRESHOLD # Only consider points with similarity > threshold
89
+ )
90
+
91
+ # Skip the first result (which is the point itself)
92
+ similar_points = [p for p in similar_points if p.id != point_id]
93
+
94
+ if not similar_points:
95
+ continue
96
+
97
+ logger.debug(f"Found {len(similar_points)} semantically similar points for {point.payload.get('filename', 'unknown')}")
98
+
99
+ # Check text difference for each similar point
100
+ duplicate_ids = []
101
+ for similar_point in similar_points:
102
+ similar_id = similar_point.id
103
+
104
+ # Skip if already processed
105
+ if similar_id in processed_points:
106
+ continue
107
+
108
+ # Get content of similar point
109
+ similar_content = None
110
+ for p in all_points:
111
+ if p.id == similar_id:
112
+ similar_content = p.payload.get('content', '')
113
+ break
114
+
115
+ if not similar_content:
116
+ continue
117
+
118
+ # Calculate text difference percentage
119
+ diff_percentage = calculate_text_difference_percentage(point_content, similar_content)
120
+
121
+ # If difference is less than threshold, consider it a duplicate
122
+ if diff_percentage <= DIFFERENCE_THRESHOLD:
123
+ duplicate_ids.append(similar_id)
124
+ processed_points.add(similar_id)
125
+ logger.debug(f"Found duplicate: {similar_id} (diff: {diff_percentage:.2%})")
126
+
127
+ if duplicate_ids:
128
+ duplicates.append((point_id, duplicate_ids))
129
+ processed_points.add(point_id)
130
+
131
+ logger.info(f"Found {sum(len(dups) for _, dups in duplicates)} duplicate entries in {len(duplicates)} groups")
132
+ return duplicates
133
+
134
+ def remove_duplicates(client: QdrantClient, collection_name: str = config.embedding.collection_name) -> int:
135
+ """
136
+ Remove duplicate entries from the database based on semantic similarity and text difference.
137
+ Uses a two-step verification process:
138
+ 1. Find entries with semantic similarity > 0.9 (using vector search)
139
+ 2. For those entries, keep only those with text difference <= 5%
140
+
141
+ Args:
142
+ client: Initialized Qdrant client
143
+ collection_name: Name of the collection to query
144
+
145
+ Returns:
146
+ Number of removed duplicate entries
147
+ """
148
+ # Find duplicates
149
+ duplicate_groups = find_duplicates(client, collection_name)
150
+
151
+ if not duplicate_groups:
152
+ logger.info("No duplicates found")
153
+ return 0
154
+
155
+ # Count total duplicates
156
+ total_duplicates = sum(len(dups) for _, dups in duplicate_groups)
157
+
158
+ # Remove duplicates
159
+ for _, duplicate_ids in duplicate_groups:
160
+ if duplicate_ids:
161
+ client.delete(
162
+ collection_name=collection_name,
163
+ points_selector=PointIdsList(points=duplicate_ids)
164
+ )
165
+
166
+ logger.info(f"Removed {total_duplicates} duplicate entries from the database")
167
+ return total_duplicates
src/fabrics_processor/logger.py CHANGED
@@ -42,8 +42,8 @@ def setup_logger(log_file='fabric_to_espanso.log'):
42
  console_handler = logging.StreamHandler()
43
 
44
  # Set log levels
45
- file_handler.setLevel(logging.INFO)
46
- console_handler.setLevel(logging.INFO)
47
 
48
  # Create formatters and add it to handlers
49
  file_format = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
42
  console_handler = logging.StreamHandler()
43
 
44
  # Set log levels
45
+ file_handler.setLevel(logging.DEBUG)
46
+ console_handler.setLevel(logging.DEBUG)
47
 
48
  # Create formatters and add it to handlers
49
  file_format = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
src/fabrics_processor/output_files_generator.py CHANGED
@@ -24,6 +24,7 @@ def repr_block_string(dumper: yaml.Dumper, data: BlockString) -> yaml.ScalarNode
24
 
25
  yaml.add_representer(BlockString, repr_block_string)
26
 
 
27
  def generate_yaml_file(client: QdrantClient, collection_name: str, yaml_output_folder: str) -> None:
28
  """Generate a complete YAML file from the Qdrant database.
29
 
@@ -78,6 +79,7 @@ def generate_yaml_file(client: QdrantClient, collection_name: str, yaml_output_f
78
  raise
79
  raise RuntimeError(f"Unexpected error generating YAML: {str(e)}") from e
80
 
 
81
  def generate_markdown_files(client: QdrantClient, collection_name: str, markdown_output_folder: str) -> None:
82
  """Generate markdown files from the Qdrant database.
83
 
 
24
 
25
  yaml.add_representer(BlockString, repr_block_string)
26
 
27
+ # TODO: Remove duplicates before exporting the contents of the database to YAML files
28
  def generate_yaml_file(client: QdrantClient, collection_name: str, yaml_output_folder: str) -> None:
29
  """Generate a complete YAML file from the Qdrant database.
30
 
 
79
  raise
80
  raise RuntimeError(f"Unexpected error generating YAML: {str(e)}") from e
81
 
82
+ # TODO: Remove duplicates before exporting the contents of the database to markdown files
83
  def generate_markdown_files(client: QdrantClient, collection_name: str, markdown_output_folder: str) -> None:
84
  """Generate markdown files from the Qdrant database.
85
 
src/search_qdrant/LOG_FILE ADDED
@@ -0,0 +1,464 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ You can now view your Streamlit app in your browser.
3
+
4
+ Local URL: http://localhost:8501
5
+ Network URL: http://172.18.189.135:8501
6
+
7
+ '\\wsl.localhost\Ubuntu\home\jelle\code\fabric_to_espanso\src\search_qdrant'
8
+ CMD.EXE was started with the above path as the current directory.
9
+ UNC paths are not supported. Defaulting to Windows directory.
10
+
11
+
12
+ fabric_to_espanso - INFO - Collection fabric_patterns_hybrid ready with 198 points
13
+ fabric_to_espanso - INFO - Processed: write_essay
14
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/coding_master/system.md
15
+ fabric_to_espanso - INFO - Processed: coding_master
16
+ fabric_to_espanso - INFO - Processed: judge_output
17
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/ask_uncle_duke/system.md
18
+ fabric_to_espanso - INFO - Processed: ask_uncle_duke
19
+ fabric_to_espanso - INFO - Processed: create_investigation_visualization
20
+ fabric_to_espanso - INFO - Processed: humanize
21
+ fabric_to_espanso - INFO - Processed: analyze_prose_json
22
+ fabric_to_espanso - INFO - Processed: create_design_document
23
+ fabric_to_espanso - INFO - Processed: identify_dsrp_systems
24
+ fabric_to_espanso - INFO - Processed: official_pattern_template
25
+ fabric_to_espanso - INFO - Processed: extract_extraordinary_claims
26
+ fabric_to_espanso - INFO - Processed: create_visualization
27
+ fabric_to_espanso - INFO - Processed: create_5_sentence_summary
28
+ fabric_to_espanso - INFO - Processed: create_micro_summary
29
+ fabric_to_espanso - INFO - Processed: create_fabric_patterns-own
30
+ fabric_to_espanso - INFO - Processed: extract_wisdom_agents
31
+ fabric_to_espanso - INFO - Processed: extract_skills
32
+ fabric_to_espanso - INFO - Processed: rate_ai_result
33
+ fabric_to_espanso - INFO - Processed: create_cyber_summary
34
+ fabric_to_espanso - INFO - Processed: create_upgrade_pack
35
+ fabric_to_espanso - INFO - Processed: analyze_product_feedback
36
+ fabric_to_espanso - INFO - Processed: summarize_prompt
37
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/write_latex/system.md
38
+ fabric_to_espanso - INFO - Processed: write_latex
39
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/tweet/system.md
40
+ fabric_to_espanso - INFO - Processed: tweet
41
+ fabric_to_espanso - INFO - Processed: explain_project
42
+ fabric_to_espanso - INFO - Processed: analyze_interviewer_techniques
43
+ fabric_to_espanso - INFO - Processed: create_aphorisms
44
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/website_description-own/system.md
45
+ fabric_to_espanso - INFO - Processed: website_description-own
46
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/translate_to_dutch_or_from_dutch_to_english-own/system.md
47
+ fabric_to_espanso - INFO - Processed: translate_to_dutch_or_from_dutch_to_english-own
48
+ fabric_to_espanso - INFO - Processed: extract_business_ideas
49
+ fabric_to_espanso - INFO - Processed: create_git_diff_commit
50
+ fabric_to_espanso - INFO - Processed: create_fabric_prompt_pattern_v2-own
51
+ fabric_to_espanso - INFO - Processed: get_wow_per_minute
52
+ fabric_to_espanso - INFO - Processed: create_coding_project
53
+ fabric_to_espanso - INFO - Processed: identify_job_stories
54
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/create_costar_prompt-own/system.md
55
+ fabric_to_espanso - INFO - Processed: create_costar_prompt-own
56
+ fabric_to_espanso - INFO - Processed: create_keynote
57
+ fabric_to_espanso - INFO - Processed: extract_sponsors
58
+ fabric_to_espanso - INFO - Processed: summarize_rpg_session
59
+ fabric_to_espanso - INFO - Processed: translate
60
+ fabric_to_espanso - INFO - Processed: summarize_legislation
61
+ fabric_to_espanso - INFO - Processed: analyze_claims
62
+ fabric_to_espanso - INFO - Processed: extract_book_recommendations
63
+ fabric_to_espanso - INFO - Processed: extract_article_wisdom
64
+ fabric_to_espanso - INFO - Processed: create_newsletter_entry
65
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/create_sigma_rules/system.md
66
+ fabric_to_espanso - INFO - Processed: create_sigma_rules
67
+ fabric_to_espanso - INFO - Processed: analyze_answers
68
+ fabric_to_espanso - INFO - Processed: create_markmap_visualization
69
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/label_and_rate/system.md
70
+ fabric_to_espanso - INFO - Processed: label_and_rate
71
+ fabric_to_espanso - INFO - Processed: extract_insights
72
+ fabric_to_espanso - INFO - Processed: rate_value
73
+ fabric_to_espanso - INFO - Processed: create_command
74
+ fabric_to_espanso - INFO - Processed: explain_terms
75
+ fabric_to_espanso - INFO - Processed: explain_code
76
+ fabric_to_espanso - INFO - Processed: recommend_pipeline_upgrades
77
+ fabric_to_espanso - INFO - Processed: get_youtube_rss
78
+ fabric_to_espanso - INFO - Processed: suggest_pattern
79
+ fabric_to_espanso - INFO - Processed: analyze_malware
80
+ fabric_to_espanso - INFO - Processed: improve_academic_writing
81
+ fabric_to_espanso - INFO - Processed: create_story_explanation
82
+ fabric_to_espanso - INFO - Processed: enrich_blog_post
83
+ fabric_to_espanso - INFO - Processed: analyze_mistakes
84
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/write_python_code_with_explanations-own/system.md
85
+ fabric_to_espanso - INFO - Processed: write_python_code_with_explanations-own
86
+ fabric_to_espanso - INFO - Processed: extract_patterns
87
+ fabric_to_espanso - INFO - Processed: analyze_paper
88
+ fabric_to_espanso - INFO - Processed: create_better_frame
89
+ fabric_to_espanso - INFO - Processed: analyze_candidates
90
+ fabric_to_espanso - INFO - Processed: extract_song_meaning
91
+ fabric_to_espanso - INFO - Processed: analyze_threat_report_trends
92
+ fabric_to_espanso - INFO - Processed: create_summary
93
+ fabric_to_espanso - INFO - Processed: raw_query
94
+ fabric_to_espanso - INFO - Processed: solve_with_cot
95
+ fabric_to_espanso - INFO - Processed: analyze_sales_call
96
+ fabric_to_espanso - INFO - Processed: extract_references
97
+ fabric_to_espanso - INFO - Processed: explain_docs
98
+ fabric_to_espanso - INFO - Processed: ask_secure_by_design_questions
99
+ fabric_to_espanso - INFO - Processed: extract_questions
100
+ fabric_to_espanso - INFO - Processed: extract_algorithm_update_recommendations
101
+ fabric_to_espanso - INFO - Processed: create_report_finding
102
+ fabric_to_espanso - INFO - Processed: extract_product_features
103
+ fabric_to_espanso - INFO - Processed: agility_story
104
+ fabric_to_espanso - INFO - Processed: extract_book_ideas
105
+ fabric_to_espanso - INFO - Processed: create_ttrc_graph
106
+ fabric_to_espanso - INFO - Processed: extract_wisdom_dm
107
+ fabric_to_espanso - INFO - Processed: create_ttrc_narrative
108
+ fabric_to_espanso - INFO - Processed: prepare_7s_strategy
109
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/convert_to_markdown/system.md
110
+ fabric_to_espanso - INFO - Processed: convert_to_markdown
111
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/analyze_incident/system.md
112
+ fabric_to_espanso - INFO - Processed: analyze_incident
113
+ fabric_to_espanso - INFO - Processed: summarize_meeting
114
+ fabric_to_espanso - INFO - Processed: create_formal_email
115
+ fabric_to_espanso - INFO - Processed: refine_design_document
116
+ fabric_to_espanso - INFO - Processed: improve_prompt
117
+ fabric_to_espanso - INFO - Processed: create_logo
118
+ fabric_to_espanso - INFO - Processed: create_network_threat_landscape
119
+ fabric_to_espanso - INFO - Processed: extract_keywords_and_subjects_from_text-own
120
+ fabric_to_espanso - INFO - Processed: extract_most_redeeming_thing
121
+ fabric_to_espanso - INFO - Processed: create_rpg_summary
122
+ fabric_to_espanso - INFO - Processed: analyze_proposition
123
+ fabric_to_espanso - INFO - Processed: write_nuclei_template_rule
124
+ fabric_to_espanso - INFO - Processed: analyze_email_headers
125
+ fabric_to_espanso - INFO - Processed: analyze_presentation
126
+ fabric_to_espanso - INFO - Processed: improve_writing
127
+ fabric_to_espanso - INFO - Processed: create_user_story
128
+ fabric_to_espanso - INFO - Processed: create_stride_threat_model
129
+ fabric_to_espanso - INFO - Processed: analyze_debate
130
+ fabric_to_espanso - INFO - Processed: analyze_spiritual_text
131
+ fabric_to_espanso - INFO - Processed: extract_insights_dm
132
+ fabric_to_espanso - INFO - Processed: analyze_military_strategy
133
+ fabric_to_espanso - INFO - Processed: analyze_personality
134
+ fabric_to_espanso - INFO - Processed: transcribe_minutes
135
+ fabric_to_espanso - INFO - Processed: extract_recipe
136
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/summarize_paper/system.md
137
+ fabric_to_espanso - INFO - Processed: summarize_paper
138
+ fabric_to_espanso - INFO - Processed: check_agreement
139
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/rewrite_python_code_with_explanations-own/system.md
140
+ fabric_to_espanso - INFO - Processed: rewrite_python_code_with_explanations-own
141
+ fabric_to_espanso - INFO - Processed: find_logical_fallacies
142
+ fabric_to_espanso - INFO - Processed: extract_wisdom
143
+ fabric_to_espanso - INFO - Processed: extract_wisdom_nometa
144
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/solveitwithcode_review_repl_driven_process_detailed_extreme-own/system.md
145
+ fabric_to_espanso - INFO - Processed: solveitwithcode_review_repl_driven_process_detailed_extreme-own
146
+ fabric_to_espanso - INFO - Processed: identify_dsrp_distinctions
147
+ fabric_to_espanso - INFO - Processed: compare_two_documents-own
148
+ fabric_to_espanso - INFO - Processed: extract_controversial_ideas
149
+ fabric_to_espanso - INFO - Processed: create_tags
150
+ fabric_to_espanso - INFO - Processed: review_design
151
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/start_in_depth_discussion_with_functions-own/system.md
152
+ fabric_to_espanso - INFO - Processed: start_in_depth_discussion_with_functions-own
153
+ fabric_to_espanso - INFO - Processed: create_art_prompt
154
+ fabric_to_espanso - INFO - Processed: analyze_patent
155
+ fabric_to_espanso - INFO - Processed: identify_dsrp_relationships
156
+ fabric_to_espanso - INFO - Processed: analyze_cfp_submission
157
+ fabric_to_espanso - INFO - Processed: create_mermaid_visualization_for_github
158
+ fabric_to_espanso - INFO - Processed: create_graph_from_input
159
+ fabric_to_espanso - INFO - Processed: extract_main_idea
160
+ fabric_to_espanso - INFO - Processed: extract_latest_video
161
+ fabric_to_espanso - INFO - Processed: extract_core_message
162
+ fabric_to_espanso - INFO - Processed: extract_jokes
163
+ fabric_to_espanso - INFO - Processed: create_academic_paper
164
+ fabric_to_espanso - INFO - Processed: create_reading_plan
165
+ fabric_to_espanso - INFO - Processed: analyze_risk
166
+ fabric_to_espanso - INFO - Processed: improve_report_finding
167
+ fabric_to_espanso - INFO - Processed: explain_math
168
+ fabric_to_espanso - INFO - Processed: summarize_git_changes
169
+ fabric_to_espanso - INFO - Processed: recommend_talkpanel_topics
170
+ fabric_to_espanso - INFO - Processed: extract_predictions
171
+ fabric_to_espanso - INFO - Processed: extract_primary_solution
172
+ fabric_to_espanso - INFO - Processed: extract_videoid
173
+ fabric_to_espanso - INFO - Processed: create_show_intro
174
+ fabric_to_espanso - INFO - Processed: summarize_git_diff
175
+ fabric_to_espanso - INFO - Processed: create_quiz
176
+ fabric_to_espanso - INFO - Processed: write_semgrep_rule
177
+ fabric_to_espanso - INFO - Processed: write_hackerone_report
178
+ fabric_to_espanso - INFO - Processed: summarize_micro
179
+ fabric_to_espanso - INFO - Processed: create_ai_jobs_analysis
180
+ fabric_to_espanso - INFO - Processed: create_pattern
181
+ fabric_to_espanso - INFO - Processed: capture_thinkers_work
182
+ fabric_to_espanso - INFO - Processed: analyze_prose_pinker
183
+ fabric_to_espanso - INFO - Processed: create_threat_scenarios
184
+ fabric_to_espanso - INFO - Processed: extract_ctf_writeup
185
+ fabric_to_espanso - INFO - Processed: ai
186
+ fabric_to_espanso - INFO - Processed: rate_ai_response
187
+ fabric_to_espanso - INFO - Processed: create_prd
188
+ fabric_to_espanso - INFO - Processed: clean_text
189
+ fabric_to_espanso - INFO - Processed: create_video_chapters
190
+ fabric_to_espanso - INFO - Processed: summarize_lecture
191
+ fabric_to_espanso - INFO - Processed: identify_dsrp_perspectives
192
+ fabric_to_espanso - INFO - Processed: recommend_artists
193
+ fabric_to_espanso - INFO - Processed: extract_ideas
194
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/solveitwithcode_review_repl_driven_process-own/system.md
195
+ fabric_to_espanso - INFO - Processed: solveitwithcode_review_repl_driven_process-own
196
+ fabric_to_espanso - INFO - Processed: to_flashcards
197
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/extract_instructions/system.md
198
+ fabric_to_espanso - INFO - Processed: extract_instructions
199
+ fabric_to_espanso - INFO - Processed: write_micro_essay
200
+ fabric_to_espanso - INFO - Processed: extract_primary_problem
201
+ fabric_to_espanso - INFO - Processed: create_hormozi_offer
202
+ fabric_to_espanso - INFO - Processed: analyze_prose
203
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/solveitwithcode_review_repl_driven_process_detailed-own/system.md
204
+ fabric_to_espanso - INFO - Processed: solveitwithcode_review_repl_driven_process_detailed-own
205
+ fabric_to_espanso - INFO - Processed: analyze_logs
206
+ fabric_to_espanso - INFO - Processed: create_recursive_outline
207
+ fabric_to_espanso - INFO - Processed: create_image_prompt_from_book_extract-own
208
+ fabric_to_espanso - INFO - Processed: analyze_tech_impact
209
+ fabric_to_espanso - INFO - Processed: find_hidden_message
210
+ fabric_to_espanso - INFO - Processed: create_npc
211
+ fabric_to_espanso - INFO - Processed: provide_guidance
212
+ fabric_to_espanso - INFO - Processed: export_data_as_csv
213
+ fabric_to_espanso - INFO - Processed: show_fabric_options_markmap
214
+ fabric_to_espanso - INFO - Processed: summarize_debate
215
+ fabric_to_espanso - INFO - Processed: answer_interview_question
216
+ fabric_to_espanso - INFO - Processed: extract_poc
217
+ fabric_to_espanso - INFO - Processed: rate_content
218
+ fabric_to_espanso - INFO - Processed: create_diy
219
+ fabric_to_espanso - INFO - Processed: create_idea_compass
220
+ fabric_to_espanso - INFO - Processed: create_security_update
221
+ fabric_to_espanso - INFO - Processed: extract_recommendations
222
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/md_callout/system.md
223
+ fabric_to_espanso - INFO - Processed: md_callout
224
+ fabric_to_espanso - INFO - Processed: analyze_threat_report
225
+ fabric_to_espanso - INFO - Processed: dialog_with_socrates
226
+ fabric_to_espanso - INFO - Processed: summarize_newsletter
227
+ fabric_to_espanso - INFO - Processed: create_mermaid_visualization
228
+ fabric_to_espanso - INFO - Processed: analyze_comments
229
+ fabric_to_espanso - INFO - Processed: summarize
230
+ fabric_to_espanso - INFO - Processed: compare_and_contrast
231
+ fabric_to_espanso - INFO - Successfully processed 198 files in fabric patterns folder
232
+ fabric_to_espanso - INFO - Changes detected: 0 new, 0 modified, 0 deleted
233
+ fabric_to_espanso - INFO - Database update completed successfully
234
+ fabric_to_espanso - INFO - YAML file generated successfully at /mnt/c/Users/barle/AppData/Roaming/espanso/match/fabric_patterns.yml
235
+ fabric_to_espanso - INFO - Generated 198 Markdown files generated successfully at /mnt/c/Obsidian/BrainCave/Extra/textgenerator/templates/fabric
236
+ fabric_to_espanso - INFO - Processed: write_essay
237
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/coding_master/system.md
238
+ fabric_to_espanso - INFO - Processed: coding_master
239
+ fabric_to_espanso - INFO - Processed: judge_output
240
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/ask_uncle_duke/system.md
241
+ fabric_to_espanso - INFO - Processed: ask_uncle_duke
242
+ fabric_to_espanso - INFO - Processed: create_investigation_visualization
243
+ fabric_to_espanso - INFO - Processed: humanize
244
+ fabric_to_espanso - INFO - Processed: analyze_prose_json
245
+ fabric_to_espanso - INFO - Processed: create_design_document
246
+ fabric_to_espanso - INFO - Processed: identify_dsrp_systems
247
+ fabric_to_espanso - INFO - Processed: official_pattern_template
248
+ fabric_to_espanso - INFO - Processed: extract_extraordinary_claims
249
+ fabric_to_espanso - INFO - Processed: create_visualization
250
+ fabric_to_espanso - INFO - Processed: create_5_sentence_summary
251
+ fabric_to_espanso - INFO - Processed: create_micro_summary
252
+ fabric_to_espanso - INFO - Processed: create_fabric_patterns-own
253
+ fabric_to_espanso - INFO - Processed: extract_wisdom_agents
254
+ fabric_to_espanso - INFO - Processed: extract_skills
255
+ fabric_to_espanso - INFO - Processed: rate_ai_result
256
+ fabric_to_espanso - INFO - Processed: create_cyber_summary
257
+ fabric_to_espanso - INFO - Processed: create_upgrade_pack
258
+ fabric_to_espanso - INFO - Processed: analyze_product_feedback
259
+ fabric_to_espanso - INFO - Processed: summarize_prompt
260
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/write_latex/system.md
261
+ fabric_to_espanso - INFO - Processed: write_latex
262
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/tweet/system.md
263
+ fabric_to_espanso - INFO - Processed: tweet
264
+ fabric_to_espanso - INFO - Processed: explain_project
265
+ fabric_to_espanso - INFO - Processed: analyze_interviewer_techniques
266
+ fabric_to_espanso - INFO - Processed: create_aphorisms
267
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/website_description-own/system.md
268
+ fabric_to_espanso - INFO - Processed: website_description-own
269
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/translate_to_dutch_or_from_dutch_to_english-own/system.md
270
+ fabric_to_espanso - INFO - Processed: translate_to_dutch_or_from_dutch_to_english-own
271
+ fabric_to_espanso - INFO - Processed: extract_business_ideas
272
+ fabric_to_espanso - INFO - Processed: create_git_diff_commit
273
+ fabric_to_espanso - INFO - Processed: create_fabric_prompt_pattern_v2-own
274
+ fabric_to_espanso - INFO - Processed: get_wow_per_minute
275
+ fabric_to_espanso - INFO - Processed: create_coding_project
276
+ fabric_to_espanso - INFO - Processed: identify_job_stories
277
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/create_costar_prompt-own/system.md
278
+ fabric_to_espanso - INFO - Processed: create_costar_prompt-own
279
+ fabric_to_espanso - INFO - Processed: create_keynote
280
+ fabric_to_espanso - INFO - Processed: extract_sponsors
281
+ fabric_to_espanso - INFO - Processed: summarize_rpg_session
282
+ fabric_to_espanso - INFO - Processed: translate
283
+ fabric_to_espanso - INFO - Processed: summarize_legislation
284
+ fabric_to_espanso - INFO - Processed: analyze_claims
285
+ fabric_to_espanso - INFO - Processed: extract_book_recommendations
286
+ fabric_to_espanso - INFO - Processed: extract_article_wisdom
287
+ fabric_to_espanso - INFO - Processed: create_newsletter_entry
288
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/create_sigma_rules/system.md
289
+ fabric_to_espanso - INFO - Processed: create_sigma_rules
290
+ fabric_to_espanso - INFO - Processed: analyze_answers
291
+ fabric_to_espanso - INFO - Processed: create_markmap_visualization
292
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/label_and_rate/system.md
293
+ fabric_to_espanso - INFO - Processed: label_and_rate
294
+ fabric_to_espanso - INFO - Processed: extract_insights
295
+ fabric_to_espanso - INFO - Processed: rate_value
296
+ fabric_to_espanso - INFO - Processed: create_command
297
+ fabric_to_espanso - INFO - Processed: explain_terms
298
+ fabric_to_espanso - INFO - Processed: explain_code
299
+ fabric_to_espanso - INFO - Processed: recommend_pipeline_upgrades
300
+ fabric_to_espanso - INFO - Processed: get_youtube_rss
301
+ fabric_to_espanso - INFO - Processed: suggest_pattern
302
+ fabric_to_espanso - INFO - Processed: analyze_malware
303
+ fabric_to_espanso - INFO - Processed: improve_academic_writing
304
+ fabric_to_espanso - INFO - Processed: create_story_explanation
305
+ fabric_to_espanso - INFO - Processed: enrich_blog_post
306
+ fabric_to_espanso - INFO - Processed: analyze_mistakes
307
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/write_python_code_with_explanations-own/system.md
308
+ fabric_to_espanso - INFO - Processed: write_python_code_with_explanations-own
309
+ fabric_to_espanso - INFO - Processed: extract_patterns
310
+ fabric_to_espanso - INFO - Processed: analyze_paper
311
+ fabric_to_espanso - INFO - Processed: create_better_frame
312
+ fabric_to_espanso - INFO - Processed: analyze_candidates
313
+ fabric_to_espanso - INFO - Processed: extract_song_meaning
314
+ fabric_to_espanso - INFO - Processed: analyze_threat_report_trends
315
+ fabric_to_espanso - INFO - Processed: create_summary
316
+ fabric_to_espanso - INFO - Processed: raw_query
317
+ fabric_to_espanso - INFO - Processed: solve_with_cot
318
+ fabric_to_espanso - INFO - Processed: analyze_sales_call
319
+ fabric_to_espanso - INFO - Processed: extract_references
320
+ fabric_to_espanso - INFO - Processed: explain_docs
321
+ fabric_to_espanso - INFO - Processed: ask_secure_by_design_questions
322
+ fabric_to_espanso - INFO - Processed: extract_questions
323
+ fabric_to_espanso - INFO - Processed: extract_algorithm_update_recommendations
324
+ fabric_to_espanso - INFO - Processed: create_report_finding
325
+ fabric_to_espanso - INFO - Processed: extract_product_features
326
+ fabric_to_espanso - INFO - Processed: agility_story
327
+ fabric_to_espanso - INFO - Processed: extract_book_ideas
328
+ fabric_to_espanso - INFO - Processed: create_ttrc_graph
329
+ fabric_to_espanso - INFO - Processed: extract_wisdom_dm
330
+ fabric_to_espanso - INFO - Processed: create_ttrc_narrative
331
+ fabric_to_espanso - INFO - Processed: prepare_7s_strategy
332
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/convert_to_markdown/system.md
333
+ fabric_to_espanso - INFO - Processed: convert_to_markdown
334
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/analyze_incident/system.md
335
+ fabric_to_espanso - INFO - Processed: analyze_incident
336
+ fabric_to_espanso - INFO - Processed: summarize_meeting
337
+ fabric_to_espanso - INFO - Processed: create_formal_email
338
+ fabric_to_espanso - INFO - Processed: refine_design_document
339
+ fabric_to_espanso - INFO - Processed: improve_prompt
340
+ fabric_to_espanso - INFO - Processed: create_logo
341
+ fabric_to_espanso - INFO - Processed: create_network_threat_landscape
342
+ fabric_to_espanso - INFO - Processed: extract_keywords_and_subjects_from_text-own
343
+ fabric_to_espanso - INFO - Processed: extract_most_redeeming_thing
344
+ fabric_to_espanso - INFO - Processed: create_rpg_summary
345
+ fabric_to_espanso - INFO - Processed: analyze_proposition
346
+ fabric_to_espanso - INFO - Processed: write_nuclei_template_rule
347
+ fabric_to_espanso - INFO - Processed: analyze_email_headers
348
+ fabric_to_espanso - INFO - Processed: analyze_presentation
349
+ fabric_to_espanso - INFO - Processed: improve_writing
350
+ fabric_to_espanso - INFO - Processed: create_user_story
351
+ fabric_to_espanso - INFO - Processed: create_stride_threat_model
352
+ fabric_to_espanso - INFO - Processed: analyze_debate
353
+ fabric_to_espanso - INFO - Processed: analyze_spiritual_text
354
+ fabric_to_espanso - INFO - Processed: extract_insights_dm
355
+ fabric_to_espanso - INFO - Processed: analyze_military_strategy
356
+ fabric_to_espanso - INFO - Processed: analyze_personality
357
+ fabric_to_espanso - INFO - Processed: transcribe_minutes
358
+ fabric_to_espanso - INFO - Processed: extract_recipe
359
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/summarize_paper/system.md
360
+ fabric_to_espanso - INFO - Processed: summarize_paper
361
+ fabric_to_espanso - INFO - Processed: check_agreement
362
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/rewrite_python_code_with_explanations-own/system.md
363
+ fabric_to_espanso - INFO - Processed: rewrite_python_code_with_explanations-own
364
+ fabric_to_espanso - INFO - Processed: find_logical_fallacies
365
+ fabric_to_espanso - INFO - Processed: extract_wisdom
366
+ fabric_to_espanso - INFO - Processed: extract_wisdom_nometa
367
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/solveitwithcode_review_repl_driven_process_detailed_extreme-own/system.md
368
+ fabric_to_espanso - INFO - Processed: solveitwithcode_review_repl_driven_process_detailed_extreme-own
369
+ fabric_to_espanso - INFO - Processed: identify_dsrp_distinctions
370
+ fabric_to_espanso - INFO - Processed: compare_two_documents-own
371
+ fabric_to_espanso - INFO - Processed: extract_controversial_ideas
372
+ fabric_to_espanso - INFO - Processed: create_tags
373
+ fabric_to_espanso - INFO - Processed: review_design
374
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/start_in_depth_discussion_with_functions-own/system.md
375
+ fabric_to_espanso - INFO - Processed: start_in_depth_discussion_with_functions-own
376
+ fabric_to_espanso - INFO - Processed: create_art_prompt
377
+ fabric_to_espanso - INFO - Processed: analyze_patent
378
+ fabric_to_espanso - INFO - Processed: identify_dsrp_relationships
379
+ fabric_to_espanso - INFO - Processed: analyze_cfp_submission
380
+ fabric_to_espanso - INFO - Processed: create_mermaid_visualization_for_github
381
+ fabric_to_espanso - INFO - Processed: create_graph_from_input
382
+ fabric_to_espanso - INFO - Processed: extract_main_idea
383
+ fabric_to_espanso - INFO - Processed: extract_latest_video
384
+ fabric_to_espanso - INFO - Processed: extract_core_message
385
+ fabric_to_espanso - INFO - Processed: extract_jokes
386
+ fabric_to_espanso - INFO - Processed: create_academic_paper
387
+ fabric_to_espanso - INFO - Processed: create_reading_plan
388
+ fabric_to_espanso - INFO - Processed: analyze_risk
389
+ fabric_to_espanso - INFO - Processed: improve_report_finding
390
+ fabric_to_espanso - INFO - Processed: explain_math
391
+ fabric_to_espanso - INFO - Processed: summarize_git_changes
392
+ fabric_to_espanso - INFO - Processed: recommend_talkpanel_topics
393
+ fabric_to_espanso - INFO - Processed: extract_predictions
394
+ fabric_to_espanso - INFO - Processed: extract_primary_solution
395
+ fabric_to_espanso - INFO - Processed: extract_videoid
396
+ fabric_to_espanso - INFO - Processed: create_show_intro
397
+ fabric_to_espanso - INFO - Processed: summarize_git_diff
398
+ fabric_to_espanso - INFO - Processed: create_quiz
399
+ fabric_to_espanso - INFO - Processed: write_semgrep_rule
400
+ fabric_to_espanso - INFO - Processed: write_hackerone_report
401
+ fabric_to_espanso - INFO - Processed: summarize_micro
402
+ fabric_to_espanso - INFO - Processed: create_ai_jobs_analysis
403
+ fabric_to_espanso - INFO - Processed: create_pattern
404
+ fabric_to_espanso - INFO - Processed: capture_thinkers_work
405
+ fabric_to_espanso - INFO - Processed: analyze_prose_pinker
406
+ fabric_to_espanso - INFO - Processed: create_threat_scenarios
407
+ fabric_to_espanso - INFO - Processed: extract_ctf_writeup
408
+ fabric_to_espanso - INFO - Processed: ai
409
+ fabric_to_espanso - INFO - Processed: rate_ai_response
410
+ fabric_to_espanso - INFO - Processed: create_prd
411
+ fabric_to_espanso - INFO - Processed: clean_text
412
+ fabric_to_espanso - INFO - Processed: create_video_chapters
413
+ fabric_to_espanso - INFO - Processed: summarize_lecture
414
+ fabric_to_espanso - INFO - Processed: identify_dsrp_perspectives
415
+ fabric_to_espanso - INFO - Processed: recommend_artists
416
+ fabric_to_espanso - INFO - Processed: extract_ideas
417
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/solveitwithcode_review_repl_driven_process-own/system.md
418
+ fabric_to_espanso - INFO - Processed: solveitwithcode_review_repl_driven_process-own
419
+ fabric_to_espanso - INFO - Processed: to_flashcards
420
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/extract_instructions/system.md
421
+ fabric_to_espanso - INFO - Processed: extract_instructions
422
+ fabric_to_espanso - INFO - Processed: write_micro_essay
423
+ fabric_to_espanso - INFO - Processed: extract_primary_problem
424
+ fabric_to_espanso - INFO - Processed: create_hormozi_offer
425
+ fabric_to_espanso - INFO - Processed: analyze_prose
426
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/solveitwithcode_review_repl_driven_process_detailed-own/system.md
427
+ fabric_to_espanso - INFO - Processed: solveitwithcode_review_repl_driven_process_detailed-own
428
+ fabric_to_espanso - INFO - Processed: analyze_logs
429
+ fabric_to_espanso - INFO - Processed: create_recursive_outline
430
+ fabric_to_espanso - INFO - Processed: create_image_prompt_from_book_extract-own
431
+ fabric_to_espanso - INFO - Processed: analyze_tech_impact
432
+ fabric_to_espanso - INFO - Processed: find_hidden_message
433
+ fabric_to_espanso - INFO - Processed: create_npc
434
+ fabric_to_espanso - INFO - Processed: provide_guidance
435
+ fabric_to_espanso - INFO - Processed: export_data_as_csv
436
+ fabric_to_espanso - INFO - Processed: show_fabric_options_markmap
437
+ fabric_to_espanso - INFO - Processed: summarize_debate
438
+ fabric_to_espanso - INFO - Processed: answer_interview_question
439
+ fabric_to_espanso - INFO - Processed: extract_poc
440
+ fabric_to_espanso - INFO - Processed: rate_content
441
+ fabric_to_espanso - INFO - Processed: create_diy
442
+ fabric_to_espanso - INFO - Processed: create_idea_compass
443
+ fabric_to_espanso - INFO - Processed: create_security_update
444
+ fabric_to_espanso - INFO - Processed: extract_recommendations
445
+ fabric_to_espanso - WARNING - No sections extracted from /home/jelle/.config/fabric/patterns/md_callout/system.md
446
+ fabric_to_espanso - INFO - Processed: md_callout
447
+ fabric_to_espanso - INFO - Processed: analyze_threat_report
448
+ fabric_to_espanso - INFO - Processed: dialog_with_socrates
449
+ fabric_to_espanso - INFO - Processed: summarize_newsletter
450
+ fabric_to_espanso - INFO - Processed: create_mermaid_visualization
451
+ fabric_to_espanso - INFO - Processed: analyze_comments
452
+ fabric_to_espanso - INFO - Processed: summarize
453
+ fabric_to_espanso - INFO - Processed: compare_and_contrast
454
+ fabric_to_espanso - INFO - Successfully processed 198 files in fabric patterns folder
455
+ fabric_to_espanso - INFO - Changes detected: 0 new, 0 modified, 0 deleted
456
+ fabric_to_espanso - INFO - Database update completed successfully
457
+ fabric_to_espanso - INFO - YAML file generated successfully at /mnt/c/Users/barle/AppData/Roaming/espanso/match/fabric_patterns.yml
458
+ fabric_to_espanso - INFO - Generated 198 Markdown files generated successfully at /mnt/c/Obsidian/BrainCave/Extra/textgenerator/templates/fabric
459
+ fabric_to_espanso - INFO - Collection fabric_patterns_hybrid ready with 198 points
460
+ Generating YAML file...
461
+ Generating markdown files...
462
+ Generating YAML file...
463
+ Generating markdown files...
464
+ Stopping...
src/search_qdrant/database_query.py CHANGED
@@ -5,6 +5,10 @@ from qdrant_client.models import QueryResponse
5
  import argparse
6
  from src.fabrics_processor.config import config
7
 
 
 
 
 
8
  def query_qdrant_database(
9
  query: str,
10
  client: QdrantClient,
 
5
  import argparse
6
  from src.fabrics_processor.config import config
7
 
8
+ # TODO: Use reranking to get even better search results
9
+ # TODO: Add an option to monitor the quality of the search responses with thumbs up/down feedback
10
+ # Store evaluations in an SQLite database with the query, the returned prompt, and the evaluation (up/down)
11
+ # This will create a database of good and bad examples to improve the search model
12
  def query_qdrant_database(
13
  query: str,
14
  client: QdrantClient,
src/search_qdrant/run_streamlit_terminal_visible.sh CHANGED
@@ -1,13 +1,13 @@
1
  #!/bin/bash
2
 
3
  # Add the project root to PYTHONPATH
4
- export PYTHONPATH="/home/jelle/Tools/pythagora-core/workspace/fabric_to_espanso:$PYTHONPATH"
5
 
6
  # Create a log directory if it doesn't exist
7
- LOG_DIR="/home/jelle/Tools/pythagora-core/workspace/fabric_to_espanso/logs"
8
  mkdir -p "$LOG_DIR"
9
  LOG_FILE="$LOG_DIR/streamlit.log"
10
 
11
  # Run the streamlit app
12
  echo "Starting Streamlit app..."
13
- /home/jelle/Tools/pythagora-core/workspace/fabric_to_espanso/.venv/bin/streamlit run ~/Tools/pythagora-core/workspace/fabric_to_espanso/src/search_qdrant/streamlit_app.py
 
1
  #!/bin/bash
2
 
3
  # Add the project root to PYTHONPATH
4
+ export PYTHONPATH="/home/jelle/code/fabric_to_espanso:$PYTHONPATH"
5
 
6
  # Create a log directory if it doesn't exist
7
+ LOG_DIR="/home/jelle/code/fabric_to_espanso/logs"
8
  mkdir -p "$LOG_DIR"
9
  LOG_FILE="$LOG_DIR/streamlit.log"
10
 
11
  # Run the streamlit app
12
  echo "Starting Streamlit app..."
13
+ /home/jelle/code/fabric_to_espanso/.venv/bin/streamlit run ~/code/fabric_to_espanso/src/search_qdrant/streamlit_app.py
src/search_qdrant/streamlit_app.py CHANGED
@@ -11,6 +11,7 @@ from src.fabrics_processor.logger import setup_logger
11
  import logging
12
  import atexit
13
  from src.fabrics_processor.config import config
 
14
 
15
  # Configure logging
16
  logger = setup_logger()
@@ -156,14 +157,32 @@ def update_database():
156
  fabric_patterns_folder=config.fabric_patterns_folder
157
  )
158
 
159
- # Update the database
160
- update_qdrant_database(
161
- client=st.session_state.client,
162
- collection_name=config.embedding.collection_name,
163
- new_files=new_files,
164
- modified_files=modified_files,
165
- deleted_files=deleted_files
166
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
  # Get updated collection info
169
  collection_info = st.session_state.client.get_collection(config.embedding.collection_name)
@@ -177,6 +196,7 @@ def update_database():
177
  - {len(new_files)} new files
178
  - {len(modified_files)} modified files
179
  - {len(deleted_files)} deleted files
 
180
 
181
  Database entries:
182
  - Initial: {initial_points}
 
11
  import logging
12
  import atexit
13
  from src.fabrics_processor.config import config
14
+ from src.fabrics_processor.deduplicator import remove_duplicates
15
 
16
  # Configure logging
17
  logger = setup_logger()
 
157
  fabric_patterns_folder=config.fabric_patterns_folder
158
  )
159
 
160
+ # Update the database if there are changes
161
+ if any([new_files, modified_files, deleted_files]):
162
+ st.info("Changes detected. Updating database...")
163
+ update_qdrant_database(
164
+ client=st.session_state.client,
165
+ collection_name=config.embedding.collection_name,
166
+ new_files=new_files,
167
+ modified_files=modified_files,
168
+ deleted_files=deleted_files
169
+ )
170
+ else:
171
+ st.info("No changes detected in input folders.")
172
+
173
+ # Create a separate section for deduplication - ALWAYS run this regardless of file changes
174
+ st.subheader("Deduplication Process")
175
+ with st.spinner("Checking for and removing duplicate entries..."):
176
+ # Run the deduplication process
177
+ duplicates_removed = remove_duplicates(
178
+ client=st.session_state.client,
179
+ collection_name=config.embedding.collection_name
180
+ )
181
+
182
+ if duplicates_removed > 0:
183
+ st.success(f"Successfully removed {duplicates_removed} duplicate entries from the database")
184
+ else:
185
+ st.info("No duplicate entries found in the database")
186
 
187
  # Get updated collection info
188
  collection_info = st.session_state.client.get_collection(config.embedding.collection_name)
 
196
  - {len(new_files)} new files
197
  - {len(modified_files)} modified files
198
  - {len(deleted_files)} deleted files
199
+ - {duplicates_removed} duplicate entries removed
200
 
201
  Database entries:
202
  - Initial: {initial_points}