Integrating Existing Embeddings from Vector Database with AI Engine

This document explains how to ensure AI Engine can effectively use embeddings that exist in a vector database (e.g., Pinecone) but are not yet known to AI Engine itself.

 

When uploading vector data directly to a vector database like Pinecone, AI Engine doesn't automatically know about these embeddings. This can lead to "orphan" entries in AI Engine's Embeddings section.

 

To resolve this, we need to modify how AI Engine retrieves data from the vector database and ensure proper metadata is included when creating vectors.

Step 1: Modify AI Engine Plugin

In the file premium/addons/pinecone.php, locate the get_vector function. Add the following line to the return array:

'content' => isset( $vector['metadata']['content'] ) ? $vector['metadata']['content'] : '',

This allows AI Engine to retrieve the content of the embedding directly from the vector database.

Step 2: Include Proper Metadata When Creating Vectors

When creating vectors via Python for Pinecone, ensure you include the following metadata:

  • title: Helps understand what the embedding represents (not used internally)
  • content: The actual content of the embedding
  • model: The model used to create the embeddings (to avoid unnecessary rebuilding)

Implementation Details

Current Metadata Retrieved by AI Engine

return [
  'id' => $vectorId,
  'type' => isset( $vector['metadata']['type'] ) ? $vector['metadata']['type'] : 'manual',
  'title' => isset( $vector['metadata']['title'] ) ? $vector['metadata']['title'] : '',
  'model' => isset( $vector['metadata']['model'] ) ? $vector['metadata']['model'] : '',
  'values' => isset( $vector['values'] ) ? $vector['values'] : []
];

Modified Version (With Content)

return [
  'id' => $vectorId,
  'type' => isset( $vector['metadata']['type'] ) ? $vector['metadata']['type'] : 'manual',
  'title' => isset( $vector['metadata']['title'] ) ? $vector['metadata']['title'] : '',
  'model' => isset( $vector['metadata']['model'] ) ? $vector['metadata']['model'] : '',
  'values' => isset( $vector['values'] ) ? $vector['values'] : [],
  'content' => isset( $vector['metadata']['content'] ) ? $vector['metadata']['content'] : '',
];

Next Steps

  1. Implement the modification in the AI Engine plugin.
  1. Ensure all future vector creations include the necessary metadata.
  1. Test the integration to confirm proper retrieval and use of existing embeddings.
Did this answer your question?
😞
😐
🤩