Contents

Using Oracle db26ai from Rust with the sibyl crate (2)

Part 2: DML, DDL and vector search

WARNING:

This article closely mirrors the one about Using Oracle from Rust with oracle, as both are designed to be self-contained. You can read them either independently, sequentially, or use them to compare what they can offer to you.

In this article, I will use the sibyl crate to modify data, create and drop new tables and indexes, and perform a vector search from Rust code. I have already explained how to use it to make queries in different ways and how to easily set up an environment to work with it. If you want to read that first, I will wait for you here.

Change data with Data Manipulation Language

So far, I have shown how to use the sibyl crate to use SQL and query the database. Nevertheless, we can also use DML commands to add, modify, or delete records. However, those require using the execute() method provided by Statement.

Add data

Using an SQL insert statement, is very similar to what we have done so far. It can be used directly with the execute() method of the Statement type. Execution is simpler and returns the number of affected rows. Notice though, that I have to use a third empty (unit) positional argument because, otherwise, sibyl cannot differenciate between two positional arguments and a single bind-by-name argument.

  let sql_insert = "INSERT INTO co.products \
                    (product_name, unit_price) \
                    VALUES (:1, :2)";
  let stmt = session.prepare(sql_insert).await?;
  let inserted_products = stmt.execute(("Tracatron-3000", 9.88, ())).await?;
  println!("\n{inserted_products} new product.");

Although this is the simplest way to insert a record into the database, this approach has two problems: we don't know the id of the product we have created, and the data isn't available when we close the connection.

We can use the following statement instead and invoke it with named arguments1. Then the new id is set when the Statement is executed. As for the data disappearing after closing the connection, you may have imagined that all these commands are transactional, and we need to add a commit after executing them.

  let sql_insert = "INSERT INTO co.products \
                  (product_name, unit_price) \
                  VALUES (:name, :price) \
                  RETURNING product_id into :id";
  let stmt = session.prepare(sql_insert).await?;
  let mut new_product_id: i32 = 0;
  let inserted_products = stmt
      .execute((
          (":name", "Tracatron-3000"),
          (":price", 9.88),
          (":id", &mut new_product_id),
      ))
      .await?;
  session.commit().await?;
  println!("\n{inserted_products} new product: {new_product_id} created.");

Run this version and verify that the product is created from SQLcl using the following statement.

  SELECT * FROM PRODUCTS WHERE PRODUCT_NAME LIKE 'Tra%';

You can then delete it with the corresponding id.

  DELETE FROM PRODUCTS WHERE PRODUCT_ID = <the id displayed in the previous result>;

Delete data

We can now update or delete this product from Rust by executing the corresponding SQL statement. I have chosen to delete it, so the execution of this code is (mostly2) idempotent.

  let sql_delete = "DELETE FROM products where product_id = :1";
  let stmt = session.prepare(sql_delete).await?;
  let deleted_products = stmt.execute(new_product_id).await?;
  session.commit().await?;
  println!("{deleted_products} product deleted.");

If you run the code with this new snippet, you shouldn't be able to find the product because it is deleted after its creation. Hocus Pocus! 🪄

Modify the schema with Data Definition Language

I have shared everything you need to achieve full CRUD capabilities, but we can go a step further and create a table and modify some parameters. I don't recommend doing this as part of the main application because it requires SYSDBA privileges, and it isn't a good idea to always have access to the required credentials in your application. However, it might be the case that you write an installation application and don't want it to be a script that your customers can modify, which could create all kinds of support havoc.

Connecting with privileges

The connection that we have been using so far was a regular one with the user CO and its password, but if we want to modify some parameters and change the schema, we need a privileged connection. The sibyl crate provides another method in the Environment meant to do that: connect_as_sysdba(). We will use the SYS user and its password (the one we used when we started the container) with this other method. And if the credentials are correct, we will now have a privileged connection. Notice that this is a different connection even though the variable name is the same: it shadows the previous one, but I have scoped it with curly braces, so all the configuration that we could have applied to the previous one wouldn't be lost.

  {
      let session = sibyl_env
          .connect_as_sysdba("localhost:1521/FREEPDB1", "sys", "0pen-S3sam3.")
          .await?;
  }

Inside that scope, we can run a query to get the value of the parameter we modified in the setup. This isn't a privileged operation, but changing it would be.

  let stmt = session
      .prepare("SELECT value FROM v$parameter WHERE name = 'vector_memory_size'")
      .await?;
  if let Some(row) = stmt.query_single(()).await? {
      let value: String = row.get(0)?;
      println!("\nvector_memory_size: {value}");
  }

Run the new version of the code and the parameter's value will be displayed, and should hopefully be the value that we set. If that isn't the case, check the database setup, as we will need it for the next steps. 🚨

Configure the DB for embeddings

Since the plan is to perform vector search, I'd like to modify the schema by adding a new table for embeddings. The table will have an auto-generated primary key, a field that will be used to identify each record, and a vector embedding.

I have taken the liberty of cutting some corners to make the code easier to read and understand. The most obvious one is the number of elements in the vector, which is only 5. That wouldn't be enough for any current embedding model3, but it makes the search easier and intuitive, given a well-designed dataset. Also, there might be better ways to relate these embeddings to the products they refer to using their primary keys, but this makes the results displayable.

Efficient searches in high-dimensional spaces are greatly simplified with an index. I will add one that improves the search, assuming I will be using the cosine distance (i.e., that I care about where the vectors point, not their size), which is a common use case in scenarios like semantic search in text documents.

  println!("\nCreating embeddings table.");
  let ddl_create_table = "CREATE TABLE embeddings (\
                          item_id NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1000 INCREMENT BY 1) PRIMARY KEY, \
                          prod_desc VARCHAR2(100), \
                          emb_vector VECTOR(5, FLOAT32)\
                          )";
  let stmt = session.prepare(ddl_create_table).await?;
  stmt.execute(()).await?;

  println!("Creating embeddings index.");
  let ddl_create_idx = "CREATE VECTOR INDEX embeddings_vector_index \
                        ON embeddings (emb_vector) \
                        ORGANIZATION INMEMORY NEIGHBOR GRAPH \
                        DISTANCE COSINE \
                        WITH TARGET ACCURACY 95";
  let stmt = session.prepare(ddl_create_idx).await?;
  stmt.execute(()).await?;

These statements are executed immediately; there is no need to issue a commit. So if you execute this and haven't included the code to drop the table yet, use SQLcl to restore the previous state.

  DROP TABLE embeddings;

Instead, I'd recommend adding the code that drops the tables (and the index) first.

  println!("Droping embeddings table.");
  let ddl_drop_table = "DROP TABLE embeddings";
  let stmt = session.prepare(ddl_drop_table).await?;
  stmt.execute(()).await?;

Now we are ready to execute the whole code and let it shine. Fantastic! 🦄

Use vector search

As a final example of what can be done with the sibyl crate, I will insert data into the newly created embeddings table and perform a vector search to demonstrate that it works as expected.

All the code in this section belongs between the creation of the embeddings table and its index and the dropping of the table.

Playing with vectors

I will start by showing how to add one record to the embeddings table. Since the primary key is auto-generated, I will not include it in the statement. The value for the prod_desc column is just a string, and I will use a textual representation of the vector for setting the value of the binding variable.

  let sql_insert_vec = "INSERT INTO embeddings (prod_desc, emb_vector) VALUES (:1, :2)";
  let vector = vec![0.1f32, 0.2, 0.3, 0.4, 0.5];
  let stmt = session.prepare(sql_insert_vec).await?;
  let vector_str = format!("{:?}", vector);
  stmt.execute(("Tracatron-3000", &vector_str, ())).await?;
  session.commit().await?;

Notice that I am using again the third dummy value in the parameters.

This should compile and run, but it would be no fun to search for just a single record. Let's solve that with some data. I have created five items, and three of them have a high value on their first dimension and quite small on the others, making them point close to the axis of that dimension. The other two are pointing elsewhere. I have also modified the statement to use bind-by-name variables. Let's replace the code to insert one vector with this new one.

  let sql_insert_vec = "INSERT INTO embeddings (prod_desc, emb_vector) VALUES (:pdesc, :vec)";
  let vec_data = vec![
      ("Dark coat", vec![0.1f32, 0.2, 0.9, 0.1, 0.3]),
      ("Nice wearable", vec![0.7f32, 0.2, 0.3, 0.2, 0.1]),
      ("Comfy socks", vec![0.3f32, 0.3, 0.3, 0.3, 0.3]),
      ("Tracatron-3000", vec![0.9f32, 0.1, 0.1, 0.1, 0.1]), // Mostly pointing in the first dimension
      ("Tracatron-2000", vec![0.9f32, 0.3, 0.3, 0.3, 0.4]),
  ];
  let stmt = session.prepare(sql_insert_vec).await?;
  for pair in vec_data {
      let vector_str = format!("{:?}", pair.1);
      stmt.execute(((":pdesc", pair.0), (":vec", &vector_str)))
          .await?;
  }
  session.commit().await?;

Using vector search

Now, let's go with the search. I will ask the database to select the first three rows based on vector distance, and I will pass a vector that mostly points along the first dimension. The query will return at most 3 rows of the expected records, which will be parsed and printed.

  let sql_query_vec = "SELECT prod_desc FROM embeddings \
                       ORDER BY VECTOR_DISTANCE(emb_vector, :1) \
                       FETCH FIRST 3 ROWS ONLY";
  let vector = vec![0.8f32, 0.1, 0.2, 0.1, 0.1];
  let vector_str = format!("{vector:?}");
  println!("Similar products to vector: '{vector_str}'");
  let stmt = session.prepare(sql_query_vec).await?;
  let rows = stmt.query(vector_str).await?;
  while let Some(row) = rows.next().await? {
      let product_name: String = row.get(0)?;
      println!("- {product_name}");
  }

Call your friends and family and run the code now. You are doing vector search from Rust. Look, Ma, no Python! 😁

Summary

After explaining how to spin up a development environment with the Oracle AI db26ai free container and how to query data from Rust, this article I have explained how to add, modify, and delete data, change the schema, and work with vectors using the sibyl crate.

I have covered the main needs for an application, including the ability to perform vector search and to add that AI feature that you wanted without using anything other than Rust and Oracle AI db26ai. You can find the code of this article and other related ones in this repo. I hope these are useful for your projects, but let me know if there are other things you would like me to explore.

Stay curious. Hack your code. See you next time!

Footnotes


1

No dummy parameter needed.

2

Generated IDs aren't reused.

3

Generally greater than 512 dimensions.