Pipelines retrievers usage
Creating a new retriever configuration
There are two types of retrievers: table and volume. Given the different nature of the data sources and the options required for each, there are different functions to create them.
Retriever for a table data source
The aidb.create_retriever_for_table function is used to create a retriever for a table data source. This is the function signature. You can see many of the functions parameters are optional and have defaults.
create_retriever_for_table( ------------------------------------------------------------------------------- name TEXT, model_name, TEXT, source_table_name regclass, source_data_column TEXT, source_data_type aidb.RetrieverSourceDataFormat, source_key_column TEXT DEFAULT 'id', vector_table TEXT DEFAULT NULL, vector_data_column TEXT DEFAULT 'embeddings', vector_key_column TEXT DEFAULT 'id', topk INTEGER DEFAULT 1, distance_operator aidb.distanceoperator DEFAULT 'L2', options JSONB DEFAULT '{}'::JSONB )
Example: Creating a retriever
SELECT aidb.create_retriever_for_table( name => 'test_retriever', model_name => 'simple_model', source_table_name => 'test_source_table', source_data_column => 'content', source_data_type => 'Text' );
If you're using only Postgres tables, skip to Creating the embeddings.
If you're using external data sources, you need to create a volume and create a retriever for it, which is explained next.
Retriever for a volume data source
Creating a new volume
Before you can create a retriever for a volume, you need to create a volume. The aidb.create_volume function is used to create a volume. This is the function signature. You can see many of the function parameters are optional and have defaults.
aidb.create_volume( ------------------------------------------------------------------------------- name TEXT, server_name TEXT, path TEXT, mime_type TEXT )
mime_type
actually takes only Text
or Image
as values.
Example: Creating a volume
SELECT aidb.create_volume( name => 'test_volume', server_name => 'test_server', path => 'test_path', mime_type => 'Text' );
The server_name
comes from calling PGFS functions to create a storage location pgfs.create_storage_location. The path
is the path to the data in the storage location.
Creating a retriever for a volume
The aidb.create_retriever_for_volume function is used to create a retriever for a volume data source. This is the function signature. You can see many of the function parameters are optional and have defaults.
aidb.create_retriever_for_volume( ------------------------------------------------------------------------------- name TEXT, model_name, TEXT, source_volume_name TEXT, vector_table TEXT DEFAULT NULL, vector_data_column TEXT DEFAULT 'embeddings', vector_key_column TEXT DEFAULT 'id', topk INTEGER DEFAULT 1, distance_operator aidb.distanceoperator DEFAULT 'L2', options JSONB DEFAULT '{}'::JSONB )
Example: Creating a retriever for a volume
SELECT aidb.create_retriever_for_volume( name => 'test_retriever_volume', model_name => 'simple_model', source_volume_name => 'test_volume' );
Creating the embeddings
You can use bulk embedding if there's existing data in the source table:
SELECT aidb.bulk_embedding('test_retriever');
Enable auto-embedding for any future changes:
SELECT aidb.enable_auto_embedding_for_table('test_retriever');
You can disable auto-embedding as well:
SELECT aidb.disable_auto_embedding_for_table('test_retriever');
Retrieving
A basic key retriever, aidb.retrieve_key
is available. It doesn't look up the source data but returns the ID/key of the matching embeddings.
Retrieving the key
aidb.retrieve_key(<retriever name>, <query string>, <optional number of results>);
Example: Retrieving the key
SELECT * FROM aidb.retrieve_key('test_retriever', 'shoes', 2);
key | distance -------+-------------------- 43941 | 0.2938963414490189 19337 | 0.3023805122617119 (2 rows)
You can use this command if you want to do a join/lookup yourself based on the key. For retrievers with external (volume) data sources, this capability is especially useful. Usually the application itself wants to do the retrieval from the external data source. Or you might want to push down the actual retrieval to a client application.
Retrieving the text
The retrieve_text
function joins the embeddings with the source data and directly returns the results:
aidb.retrieve_text(<retriever name>, <query string>, <optional number of results>);
Example
SELECT * FROM aidb.retrieve_text('test_retriever', 'jacket', 2);
key | value | distance -------+----------------------------------------------------+-------------------- 19337 | United Colors of Benetton Men Stripes Black Jacket | 0.2994317672742334 55018 | Lakme 3 in 1 Orchid Aqua Shine Lip Color | 0.3804609668507203 (2 rows)
Listing the retrievers
A view is available that lists all the retrievers. aidb.retrievers also includes some of the retrievers configuration:
SELECT * FROM aidb.retrievers;
id | name | vector_table_name | vector_table_key_column | vector_table_vector_column | model_name | topk | distance_operator | options | source_table_name | source_table_data_column | source_table_data_column_type | source_table_key_column | source_volume_name ----+---------------------+----------------------------+-------------------------+----------------------------+--------------+------+-------------------+---------+-------------------+--------------------------+-------------------------------+-------------------------+-------------------- 2 | test_retriever | test_retriever_vector | id | embeddings | simple_model | 5 | InnerProduct | {} | test_source_table | content | Text | id | 5 | test_retriever_cosa | test_retriever_cosa_vector | id | embeddings | simple_model | 1 | L2 | {} | test_source_table | content | Text | id | 3 | test_retriever_cos | test_retriever_cos_vector | id | embeddings | simple_model | 5 | Cosine | {} | test_source_table | content | Text | id | (3 rows)
We recommend that you select only the columns you're interested in:
SELECT name, source_table_name FROM aidb.retrievers;
name | source_table_name ---------------------+------------------- test_retriever | test_source_table test_retriever_cos | test_source_table test_retriever_cosa | test_source_table (3 rows)
Deleting a retriever
This command doesn't delete the vector table or anything else. It only deletes the configuration:
SELECT aidb.delete_retriever(<name>);
End-to-end example
This example is an end-to-end example for a table/text retriever.
Could this page be better? Report a problem or suggest an addition!