Skip to content

Generate UUID in Cassandra

Cassandra has excellent built-in UUID support with two main types: UUID (random v4) and TimeUUID (time-based v1). Here's how to use them:

cql
-- Generate random UUID (version 4)
SELECT uuid() FROM system.local;
-- Result: 550e8400-e29b-41d4-a716-446655440000

-- Generate TimeUUID (version 1, time-based)
SELECT now() FROM system.local;
-- Result: 50554d6e-29bb-11e5-b345-feff819cdc9f

-- Multiple UUIDs
SELECT uuid() as uuid1, uuid() as uuid2, now() as timeuuid FROM system.local;

Cassandra provides comprehensive UUID support with native data types and functions:

cql
-- UUID data type (stores UUID v4)
CREATE TABLE users (
    id UUID PRIMARY KEY,
    name TEXT,
    email TEXT,
    created_at TIMESTAMP
);

-- TimeUUID data type (stores UUID v1, time-ordered)
CREATE TABLE events (
    user_id UUID,
    event_id TIMEUUID,
    event_type TEXT,
    data TEXT,
    PRIMARY KEY (user_id, event_id)
) WITH CLUSTERING ORDER BY (event_id DESC);

-- Insert with generated UUIDs
INSERT INTO users (id, name, email, created_at) 
VALUES (uuid(), 'John Doe', '[email protected]', toTimestamp(now()));

INSERT INTO events (user_id, event_id, event_type, data)
VALUES (550e8400-e29b-41d4-a716-446655440000, now(), 'login', 'User logged in');

Advanced Cassandra UUID Usage

Time-series data with TimeUUID:

cql
-- Time-series table with TimeUUID clustering
CREATE TABLE sensor_data (
    sensor_id UUID,
    reading_time TIMEUUID,
    temperature DOUBLE,
    humidity DOUBLE,
    PRIMARY KEY (sensor_id, reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);

-- Insert time-series data
INSERT INTO sensor_data (sensor_id, reading_time, temperature, humidity)
VALUES (uuid(), now(), 23.5, 65.2);

-- Query recent readings (last hour)
SELECT * FROM sensor_data 
WHERE sensor_id = 550e8400-e29b-41d4-a716-446655440000
  AND reading_time > minTimeuuid('2024-01-01 12:00:00+0000')
  AND reading_time < maxTimeuuid('2024-01-01 13:00:00+0000');

UUID utility functions:

cql
-- Convert TimeUUID to timestamp
SELECT dateOf(now()) FROM system.local;

-- Convert timestamp to TimeUUID (min/max)
SELECT minTimeuuid('2024-01-01 00:00:00+0000') FROM system.local;
SELECT maxTimeuuid('2024-01-01 23:59:59+0000') FROM system.local;

-- Extract timestamp from TimeUUID
SELECT dateOf(50554d6e-29bb-11e5-b345-feff819cdc9f) FROM system.local;

-- Convert TimeUUID to Unix timestamp
SELECT unixTimestampOf(now()) FROM system.local;

Table Design Patterns

UUID as partition key:

cql
-- Good: UUID provides even distribution
CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY,
    username TEXT,
    email TEXT,
    profile_data TEXT,
    created_at TIMESTAMP
);

-- Index on secondary attributes
CREATE INDEX ON user_profiles (username);
CREATE INDEX ON user_profiles (email);

Composite keys with UUID:

cql
-- User sessions with TimeUUID for ordering
CREATE TABLE user_sessions (
    user_id UUID,
    session_id TIMEUUID,
    ip_address INET,
    user_agent TEXT,
    login_time TIMESTAMP,
    PRIMARY KEY (user_id, session_id)
) WITH CLUSTERING ORDER BY (session_id DESC);

-- Multi-tenant application
CREATE TABLE tenant_data (
    tenant_id UUID,
    data_id UUID,
    data_type TEXT,
    content TEXT,
    created_at TIMEUUID,
    PRIMARY KEY (tenant_id, data_id)
);

Time-bucketed data:

cql
-- Bucketed time-series for better performance
CREATE TABLE metrics_by_day (
    metric_name TEXT,
    day DATE,
    metric_time TIMEUUID,
    value DOUBLE,
    PRIMARY KEY ((metric_name, day), metric_time)
) WITH CLUSTERING ORDER BY (metric_time ASC);

-- Insert with current day bucket
INSERT INTO metrics_by_day (metric_name, day, metric_time, value)
VALUES ('cpu_usage', '2024-01-01', now(), 85.5);

Performance Optimization

UUID distribution and hotspots:

cql
-- Good: Random UUID partition keys distribute evenly
CREATE TABLE distributed_data (
    id UUID PRIMARY KEY,
    data TEXT
);

-- Avoid: Sequential values create hotspots
-- Don't use TimeUUID as partition key for high-write scenarios

-- Better: Combine UUID with bucketing
CREATE TABLE time_bucketed (
    bucket TEXT,        -- e.g., '2024-01-01'
    id TIMEUUID,
    data TEXT,
    PRIMARY KEY (bucket, id)
);

Indexing strategies:

cql
-- Secondary indexes on UUID columns
CREATE INDEX user_profile_idx ON user_profiles (user_id);

-- Materialized views with UUID
CREATE MATERIALIZED VIEW users_by_email AS
    SELECT user_id, username, email, created_at
    FROM user_profiles
    WHERE email IS NOT NULL AND user_id IS NOT NULL
    PRIMARY KEY (email, user_id);

Driver Integration

Java Driver:

java
import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.PreparedStatement;
import com.datastax.oss.driver.api.core.cql.ResultSet;
import com.datastax.oss.driver.api.core.uuid.Uuids;
import java.util.UUID;

public class CassandraUUIDExample {
    private CqlSession session;
    
    public void insertUser(String name, String email) {
        UUID userId = Uuids.random(); // Generate UUID v4
        UUID timeId = Uuids.timeBased(); // Generate TimeUUID v1
        
        PreparedStatement prepared = session.prepare(
            "INSERT INTO users (id, name, email, created_at) VALUES (?, ?, ?, ?)"
        );
        
        session.execute(prepared.bind(userId, name, email, timeId));
    }
    
    public void queryRecentEvents(UUID userId) {
        UUID oneHourAgo = Uuids.startOf(System.currentTimeMillis() - 3600000);
        
        PreparedStatement prepared = session.prepare(
            "SELECT * FROM events WHERE user_id = ? AND event_id > ?"
        );
        
        ResultSet results = session.execute(prepared.bind(userId, oneHourAgo));
        // Process results...
    }
}

Python Driver:

python
from cassandra.cluster import Cluster
from cassandra.util import uuid_from_time
import uuid
import time

cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')

# Insert with UUID
user_id = uuid.uuid4()
time_id = uuid.uuid1()

session.execute("""
    INSERT INTO users (id, name, email, created_at)
    VALUES (%s, %s, %s, %s)
""", (user_id, 'Jane Doe', '[email protected]', time_id))

# Query with time range
one_hour_ago = uuid_from_time(time.time() - 3600)
rows = session.execute("""
    SELECT * FROM events 
    WHERE user_id = %s AND event_id > %s
""", (user_id, one_hour_ago))

for row in rows:
    print(f"Event: {row.event_type} at {row.event_id}")

Node.js Driver:

javascript
const cassandra = require('cassandra-driver');
const { types } = cassandra;

const client = new cassandra.Client({
    contactPoints: ['127.0.0.1'],
    localDataCenter: 'datacenter1',
    keyspace: 'mykeyspace'
});

async function insertUser(name, email) {
    const userId = types.Uuid.random();
    const timeId = types.TimeUuid.now();
    
    const query = `
        INSERT INTO users (id, name, email, created_at)
        VALUES (?, ?, ?, ?)
    `;
    
    await client.execute(query, [userId, name, email, timeId]);
}

async function getRecentEvents(userId) {
    const oneHourAgo = types.TimeUuid.fromDate(new Date(Date.now() - 3600000));
    
    const query = `
        SELECT * FROM events 
        WHERE user_id = ? AND event_id > ?
    `;
    
    const result = await client.execute(query, [userId, oneHourAgo]);
    return result.rows;
}

Clustering and Replication

Multi-datacenter UUID considerations:

cql
-- UUIDs work perfectly across datacenters
CREATE KEYSPACE global_app WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'dc1': 3,
    'dc2': 3
};

-- Global user table
CREATE TABLE global_app.users (
    id UUID PRIMARY KEY,
    name TEXT,
    email TEXT,
    datacenter TEXT,
    created_at TIMEUUID
);

-- Insert with datacenter info
INSERT INTO global_app.users (id, name, email, datacenter, created_at)
VALUES (uuid(), 'Global User', '[email protected]', 'dc1', now());

Consistency levels with UUID:

cql
-- Strong consistency for critical UUID operations
CONSISTENCY QUORUM;
INSERT INTO users (id, name, email) VALUES (uuid(), 'Important User', '[email protected]');

-- Eventual consistency for high-throughput UUID inserts
CONSISTENCY ONE;
INSERT INTO events (user_id, event_id, event_type) VALUES (uuid(), now(), 'page_view');

Time-Series Patterns

IoT sensor data:

cql
CREATE TABLE iot_readings (
    device_id UUID,
    reading_time TIMEUUID,
    sensor_type TEXT,
    value DOUBLE,
    unit TEXT,
    PRIMARY KEY (device_id, reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy'};

-- Insert sensor reading
INSERT INTO iot_readings (device_id, reading_time, sensor_type, value, unit)
VALUES (uuid(), now(), 'temperature', 23.5, 'celsius');

-- Query last 24 hours
SELECT * FROM iot_readings
WHERE device_id = 550e8400-e29b-41d4-a716-446655440000
  AND reading_time > minTimeuuid('2024-01-01 00:00:00+0000');

Event sourcing:

cql
CREATE TABLE event_store (
    aggregate_id UUID,
    event_id TIMEUUID,
    event_type TEXT,
    event_data TEXT,
    version INT,
    PRIMARY KEY (aggregate_id, event_id)
) WITH CLUSTERING ORDER BY (event_id ASC);

-- Append event
INSERT INTO event_store (aggregate_id, event_id, event_type, event_data, version)
VALUES (uuid(), now(), 'UserCreated', '{"name": "John", "email": "[email protected]"}', 1);

-- Replay events for aggregate
SELECT * FROM event_store
WHERE aggregate_id = 550e8400-e29b-41d4-a716-446655440000
ORDER BY event_id ASC;

Migration and Maintenance

Adding UUID columns to existing tables:

cql
-- Add UUID column
ALTER TABLE existing_table ADD uuid_id UUID;

-- Populate with UUIDs (requires application logic)
-- UPDATE existing_table SET uuid_id = uuid() WHERE id = ?;

-- Create index on new UUID column
CREATE INDEX ON existing_table (uuid_id);

Data migration with UUIDs:

cql
-- Source table
CREATE TABLE old_users (
    id INT PRIMARY KEY,
    name TEXT,
    email TEXT
);

-- Target table with UUID
CREATE TABLE new_users (
    id UUID PRIMARY KEY,
    old_id INT,
    name TEXT,
    email TEXT,
    migrated_at TIMEUUID
);

-- Migration query (run from application)
-- INSERT INTO new_users (id, old_id, name, email, migrated_at)
-- SELECT uuid(), id, name, email, now() FROM old_users;

Monitoring and Troubleshooting

UUID-related queries for monitoring:

cql
-- Check table with UUID primary keys
SELECT COUNT(*) FROM users;

-- Analyze UUID distribution (requires application-level analysis)
SELECT token(id), id FROM users LIMIT 100;

-- TimeUUID range queries
SELECT COUNT(*) FROM events
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
  AND event_id > minTimeuuid('2024-01-01 00:00:00+0000')
  AND event_id < maxTimeuuid('2024-01-01 23:59:59+0000');

UUID Version Comparison

Choose the right UUID type for your Cassandra application:

  • UUID (v4) - Random, excellent for partition keys and general use
  • TimeUUID (v1) - Time-based, perfect for clustering keys and time-series data
  • Version 3 - MD5 namespace-based (application-generated)
  • Version 5 - SHA-1 namespace-based (application-generated)
  • Version 6 - Time-ordered (application-generated)
  • Version 7 - Modern time-based (application-generated)

For Cassandra applications:

  • Partition keys: Use UUID (v4) for even distribution
  • Clustering keys: Use TimeUUID (v1) for time-ordered data
  • Time-series: Use TimeUUID for chronological queries
  • IoT applications: Use TimeUUID for sensor data ordering

How do I generate UUID in other databases?

SQL databases:

  • PostgreSQL - Comprehensive UUID support with extensions
  • MySQL - Built-in UUID() function
  • SQL Server - NEWID() and NEWSEQUENTIALID()
  • SQLite - Custom functions and extensions
  • Oracle - SYS_GUID() function

NoSQL databases:

  • MongoDB - ObjectId and UUID generation
  • Redis - UUID generation with Lua scripts
  • DynamoDB - UUID generation in application code

← Back to Online UUID Generator