I ran into a very interesting situation regarding Oracle GUIDs or Globally Unique Indentifiers, the other day. I will tell you about my experience and then after that, you can read all about Oracle GUIDs in an article I originally had published in
Oracle Professional. Enjoy!
A GUID is a long sequence of characters that are supposed to be globally unique -- that is, the likelihood of a particular sequence of characters appearing (returned by the GUID generating function) more than once
on any computer running on our globe is miniscule. Thus, you can use GUIDs when you need unique values that span, say, database instances or multiple networks.
My latest obsession is a product named Qute - the Quick Unit Test Engine. You can find out more about it at
www.unit-test.com. We use GUIDs for primary keys; this makes our work substantially easier since the product and underlying data are installed on a user's computer, and we do not have to worry about possible conflicts with sequence-generated primary key values.
The Delphi frontend in Qute also takes those GUIDs and hashes their values to integers, which are then used to index arrays of data and speed up access to browser data. All fine so far.
Unfortunately, a number of users reported an error involving a "Hash Mismatch" on starting up of Qute. After lots of patient cooperation from those users, we were amazed to find that the Oracle SYS_GUID function (the GUID generator offered by Oracle) was returning sequentially-incremented values, rather than pseudo-random combinations of characters.
We then had a bug in our Delphi code such that the hashing algorithm was set to "quick and not very secure" -- meaning that strings which were not very different were hashing to the same number, which then triggered our error. We adjusted that algorithm, making it more robust, and the problem seems to have disappeared. But I thought I should let the world know about our experience.
To see if you have this issue in your installation of Oracle, run the following script:
SET SERVEROUTPUT ON
BEGIN
FOR indx IN 1 .. 5
LOOP
DBMS_OUTPUT.put_line ( SYS_GUID );
END LOOP;
END;
/
You should see five rows of wildly different values, like this:
E9DF272C0D284B338E27455430022D67
B513F57A646E4136AB220BAE52A0F6E8
A95CBEE73C404E52A28608FFFFE209DA
6F28A3D06E7143868E8E4AF92F43A085
4095D2129505456A9F6B2C001F06EDF7
But some Qute users on AIX and Compaq 64bit operating systems were instead seeing results like this:
0CC2D8AF5A0A6C9FE044080020C482B7
0CC2D8AF5A0B6C9FE044080020C482B7
0CC2D8AF5A0C6C9FE044080020C482B7
0CC2D8AF5A0D6C9FE044080020C482B7
0CC2D8AF5A0E6C9FE044080020C482B7See the difference? The algorithm that Oracle is using on these operating systems is clearly faulty; yes, the values are unique but they are far from even pseudo-random.
The moral of the story is: if you are going to rely on the Oracle SYS_GUID function to generate GUID values, test the algorithm and make sure you are satisfied and can live with the results.
Going Global with GUIDs
Steven Feuerstein, Copyright 2005
Originally published in Oracle Professional, Ragan Communications
What is a GUID?
You've probably heard about and even seen some funky-looking string that is referred to as a "GUID". This article explains what a GUID is, how they can be very useful, and how you can work with them inside Oracle.
Let's start with a formal definition of a GUID, taken from Wikipedia (http://en.wikipedia.org/wiki/GUID):
"A Globally Unique Identifier or GUID is a pseudo-random number used in software applications. Each generated GUID is "mathematically guaranteed" to be unique. This is based on the simple principal that the total number of unique keys (264 or ) is so large that the possibility of the same number being generated twice is virtually zero. The GUID is an implementation by Microsoft of a standard called Universally Unique Identifier or UUID, specified by the Open Software Foundation (OSF)."
Ah! So a GUID is actually the Microsoft version of the UUID, about which Wikipedia tells us:
"A Universally Unique Identifier is an identifier standard used in software construction, standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE). The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that that identifier will never be unintentionally used by anyone for anything else. Information labelled with UUIDs can therefore be later combined into a single database without need to resolve name conflicts. The most widespread use of this standard is in Microsoft's Globally Unique Identifiers (GUIDs) which implement this standard.
"A UUID is essentially a 16-byte number and in its canonical form a UUID may look like this:
550E8400-E29B-11D4-A716-446655440000"
So, in point of fact and strictly speaking, a GUID is a special (Microsoft-specific) case of a UUID. Perhaps because "GUID" is easier and more fun to pronounce than "UUID," it is the most common of the two terms, even when used in a non-Microsoft context (as we find in Oracle and as I will be doing in this article). I will continue, therefore, to refer to this number as a GUID.
One comment regarding the "mathematical guarantee" mentioned above: as far as I understand, no one has come up with an algorithm that really does quarantee a unique GUID. It is possible, though very, very improbable, that you could generate a GUID that conflicts with an existing value.
GUIDs are GOOD for...
The core value of a GUID is expressed in a single line of the UUID definition above, namely: "The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination." This statement should get Oracle database programmers thinking about two things:
- Distributed databases: they are certainly a kind of "distributed system." So GUIDs should help us identify unique objects not just in a single instance but across different database instances.
- Primary keys and unique indexes: we use these constraints to "uniquely identify" rows of information within a given relational table. Perhaps a GUID could be used for these situations?
If I am writing code for an application that resides in a single database instance, then GUIDs are probably not very compelling. Suppose I face a different scenario: my application runs on many different instances. Each instance has its own users inserting their own data. On a regular basis, however, data from one instance must be merged into another instance.
In this scenario, reliance on sequence-generated primary keys gives me serious heartburn. Since users are inserting rows independently in multiple instances, sequence values are being used up at different rates. When I move data from one instance to another, I have to assume and program for primary key conflicts of various kinds.
This is exactly the situation I encountered as I built Qnxo (www.qnxo.com). This product provides access to a large and ever growing/changing set of PL/SQL scripts (reusable code and generation templates). Any changes to my central PL/SQL repository must be distributed to users. In my first phase of design and construction of Qnxo, I relied on sequence-generated primary keys. As soon as I realized the error of my ways, I converted as much as possible over to GUIDs.
Let's take a closer look at some of the lessons I learned about applying GUIDs in the PL/SQL environment.
Oracle and GUIDs
To help us work with GUIDs, Oracle provides a SQL function named SYS_GUID, documented as follows:
"SYS_GUID generates and returns a globally unique identifier (RAW value) made up of 16 bytes. On most platforms the generated identifier consists of a host identifier, a process or thread identifier of the process or thread invoking the function, and a non-repeating value (sequence of bytes) for that process or thread."
The difference between a RAW and a VARCHAR2 is that RAW string is not explicitly converted to different character sets when moved between systems by Oracle. This point is critical for GUIDs, since we wouldn't want our GUIDs to be changed as they move from one system to another. Conversely, it is also a non-issue for GUIDS, because the characters used in GUIDs are drawn from a restricted subset of ASCII characters, which are static across all character sets.
This function is also exposed in PL/SQL through the STANDARD package as follows:
CREATE OR REPLACE PACKAGE BODY STANDARD
AS
...
FUNCTION SYS_GUID
RETURN RAW
IS
c RAW (16);
BEGIN
SELECT SYS_GUID ()
INTO c
FROM SYS.DUAL;
RETURN c;
END;
END STANDARD;
I can therefore immediately obtain a GUID from within my PL/SQL program by writing code like this:
DECLARE
l_guid_raw RAW (16);
l_guid_vc2 VARCHAR2 (32);
BEGIN
DBMS_OUTPUT.put_line (SYS_GUID);
l_guid_raw := SYS_GUID;
l_guid_vc2 := SYS_GUID;
DBMS_OUTPUT.put_line (l_guid_raw);
DBMS_OUTPUT.put_line (LENGTH (l_guid_raw));
DBMS_OUTPUT.put_line (l_guid_vc2);
DBMS_OUTPUT.put_line (LENGTH (l_guid_vc2));
END;
/
And here is the output from running this code:
E7D3F9F0A0B24390AA504E79EC132677
6FB0B56BA65C4B1580827575F0DDB602
32
693295E9405D4C7C8B3453E481EF4DA5
32
Formatting for GUIDs
As Wikipedia specified earlier in this article, ""a UUID is essentially a 16-byte number and in its canonical form a UUID may look like this: 550E8400-E29B-11D4-A716-446655440000."
In addition, in many situations, the common presentation format for a GUID is:
{550E8400-E29B-11D4-A716-446655440000}
So as you decide to incorporate GUIDs in your code and underlying database, you should decide how you want to store the GUIDs. You could stick with the unvarnished RAW values, or you could store a formatted version, if that is how the GUIDs are going to be used in your application.
We decided that in the Qnxo, we decided that the GUIDs would be stored with the full formatting, including dashes and the open and close brackets.
Using GUIDs in Oracle tables
Let's look at how we can use GUIDs inside a relational table.
Below I create a table using a GUID as the primary key for a table.
CREATE TABLE guid_table (
pky RAW(16) PRIMARY KEY
, NAME VARCHAR2(100));
When I insert a row into this table, I can use the SYS_GUID function to generate my unique GUID. I try to do it this way:
INSERT INTO guid_table
VALUES (SYS_GUID, 'Steven');
but I get the following error:
ORA-00984: column not allowed here
As with .nextval, I am unable to call this function directly in SQL. So if I want to insert rows I need to write and run code like this:
DECLARE
l_pky guid_table.pky%TYPE;
BEGIN
l_pky := SYS_GUID;
INSERT INTO guid_table
VALUES (l_pky, 'Steven');
l_pky := SYS_GUID;
INSERT INTO guid_table
VALUES (l_pky, 'Sandra');
END;
/
Now I can select the data from this table:
SQL> SELECT * FROM guid_table;
PKY NAME
-------------------------------- ---------------
DF832F19657645A2866E7AD15E9F59AF Steven
5FBAB1A92157434EBDBDB5880556D151 Sandra
That's certainly easy enough! Of course, since we want to make certain that a valid GUID is placed is in this column, it would be best to encode that logic into a BEFORE INSERT trigger on this table, as in:
CREATE OR REPLACE TRIGGER guid_table_bi
BEFORE INSERT
ON guid_table
FOR EACH ROW
BEGIN
IF :NEW.pky IS NULL
THEN
:NEW.pky := SYS_GUID;
END IF;
END;
/
And I can now insert into this table as simply as this:
INSERT INTO guid_table
(NAME
)
VALUES ('Joe'
);
In the case of Qnxo, since we decided to store the formatted version of the GUID, we created tables with primary keys defined as follows:
CREATE TABLE sg_favorite (
universal_id VARCHAR2(38) PRIMARY KEY
...
);
Encapsulating Oracle's GUID function
If you decide that you want to work with Oracle GUIDs in the RAW format and never need to deal with the formatted version, then you are pretty much ready to go. If, on the other hand, you want to able to work with GUIDs in a "standard" format, you may want to take advantage of the guid_pkg I wrote to assist in Qnxo development.
The elements in the package are:
guid_t
A SUBTYPE that encapsulates the RAW(16) declaration of a GUID and gives it a name. With this subtype in place, you can declare GUIDs in your code as follows:
DECLARE
my_guid guid_pkg.guid_t;
formatted_guid_t
Another SUBTYPE, this time designed to allow you to easily and readably declare variables that hold a formatted GUID, used as follows:
DECLARE
my_guid_with_brackets guid_pkg.formatted_guid_t;
c_mask
A mask that specifies the valid format for a formatted GUID. It also serves as a wildcarded string that I can use within a LIKE statement to determine if a string matches the standard GUID format.
is_formatted_guid
A Boolean function that returns TRUE if the string passed to it has a format consistent with the standard GUID format.
formatted_guid (2)
Two overloadings of functions that return a formatted GUID. In the first overloading, you pass a string, which could be a RAW GUID value or a VARCHAR2 string. It will return a string containing the GUID in the standard format (as specified by c_mask).
Here is a script that exercises this package:
DECLARE
l_oracle_guid guid_pkg.guid_t;
l_formatted_guid guid_pkg.formatted_guid_t;
PROCEDURE bpl (val IN BOOLEAN)
IS
BEGIN
IF val
THEN
DBMS_OUTPUT.put_line ('TRUE');
ELSIF NOT val
THEN
DBMS_OUTPUT.put_line ('FALSE');
ELSE
DBMS_OUTPUT.put_line ('NULL');
END IF;
END;
BEGIN
DBMS_OUTPUT.put_line (guid_pkg.formatted_guid);
bpl (guid_pkg.is_formatted_guid (guid_pkg.formatted_guid));
bpl (guid_pkg.is_formatted_guid (SYS_GUID));
bpl (guid_pkg.is_formatted_guid ('Steven'));
DBMS_OUTPUT.put_line (guid_pkg.formatted_guid (SYS_GUID));
DBMS_OUTPUT.put_line
(guid_pkg.formatted_guid ('B16079A0567148608C171FA89E7187B3')
);
END;
/
With this output after execution:
{B33204AD-ADDB-4CC6-87F2-2D0D13594F7C}
TRUE
FALSE
FALSE
{1F17C465-E58E-4F8B-AABA-9C1C712D992B}
{B16079A0-5671-4860-8C17-1FA89E7187B3}
Listing 1. The specification of the guid_pkg package
CREATE OR REPLACE PACKAGE guid_pkg
IS
SUBTYPE guid_t IS RAW (16);
SUBTYPE formatted_guid_t IS VARCHAR2 (38);
-- Example: {34DC3EA7-21E4-4C8A-BAA1-7C2F21911524}
c_mask CONSTANT formatted_guid_t
:= '{________-____-____-____-____________}';
FUNCTION is_formatted_guid (string_in IN VARCHAR2)
RETURN BOOLEAN;
FUNCTION formatted_guid (guid_in IN VARCHAR2)
RETURN formatted_guid_t;
FUNCTION formatted_guid
RETURN formatted_guid_t;
END guid_pkg;
/
The implementation of this package is shown in Listing 2. It relies on the SYS_GUID function to generate a new GUID value, but otherwise is devoted to applying the format mask and converting un-formatted GUIDs to strings that contain the correctly-placed hyphens.
Listing 2. The body of the guid_pkg package
CREATE OR REPLACE PACKAGE BODY guid_pkg
IS
FUNCTION is_formatted_guid (string_in IN VARCHAR2)
RETURN BOOLEAN
IS
BEGIN
RETURN string_in LIKE c_mask;
END is_formatted_guid;
FUNCTION formatted_guid (guid_in IN VARCHAR2)
RETURN formatted_guid_t
IS
BEGIN
-- If not already in the 8-4-4-4-rest format, then make it so.
IF is_formatted_guid (guid_in)
THEN
RETURN guid_in;
-- Is it only missing those squiggly brackets?
ELSIF is_formatted_guid ('{' || guid_in || '}')
THEN
RETURN formatted_guid ('{' || guid_in || '}');
ELSE
RETURN '{'
|| SUBSTR (guid_in, 1, 8)
|| '-'
|| SUBSTR (guid_in, 9, 4)
|| '-'
|| SUBSTR (guid_in, 13, 4)
|| '-'
|| SUBSTR (guid_in, 17, 4)
|| '-'
|| SUBSTR (guid_in, 21)
|| '}';
END IF;
END formatted_guid;
FUNCTION formatted_guid
RETURN formatted_guid_t
IS
BEGIN
RETURN formatted_guid (SYS_GUID);
END formatted_guid;
END guid_pkg;
/
GUIDs: a nice addition to the toolbox
I'd heard of GUIDs, but never had reason to work with them before developing Qnxo. I started with standard, sequence-generated primary keys in my script table, but found that this approach was sorely lacking when it came time to distribute that content (reusable scripts and templates) to my user base.
By relying on GUIDs, I was able to greatly simplify the logic needed to upgrade distributed content from a central source. Not only did I avoid the problem of conflicting sequence-generated primary keys, but I was also able to keep intact foreign key references, since they were based on unchanging GUIDs.