Appearance
Generate UUID in Cassandra
Cassandra has excellent built-in UUID support with two main types: UUID
(random v4) and TimeUUID
(time-based v1). Here's how to use them:
cql
-- Generate random UUID (version 4)
SELECT uuid() FROM system.local;
-- Result: 550e8400-e29b-41d4-a716-446655440000
-- Generate TimeUUID (version 1, time-based)
SELECT now() FROM system.local;
-- Result: 50554d6e-29bb-11e5-b345-feff819cdc9f
-- Multiple UUIDs
SELECT uuid() as uuid1, uuid() as uuid2, now() as timeuuid FROM system.local;
Cassandra provides comprehensive UUID support with native data types and functions:
cql
-- UUID data type (stores UUID v4)
CREATE TABLE users (
id UUID PRIMARY KEY,
name TEXT,
email TEXT,
created_at TIMESTAMP
);
-- TimeUUID data type (stores UUID v1, time-ordered)
CREATE TABLE events (
user_id UUID,
event_id TIMEUUID,
event_type TEXT,
data TEXT,
PRIMARY KEY (user_id, event_id)
) WITH CLUSTERING ORDER BY (event_id DESC);
-- Insert with generated UUIDs
INSERT INTO users (id, name, email, created_at)
VALUES (uuid(), 'John Doe', '[email protected]', toTimestamp(now()));
INSERT INTO events (user_id, event_id, event_type, data)
VALUES (550e8400-e29b-41d4-a716-446655440000, now(), 'login', 'User logged in');
Advanced Cassandra UUID Usage
Time-series data with TimeUUID:
cql
-- Time-series table with TimeUUID clustering
CREATE TABLE sensor_data (
sensor_id UUID,
reading_time TIMEUUID,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY (sensor_id, reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
-- Insert time-series data
INSERT INTO sensor_data (sensor_id, reading_time, temperature, humidity)
VALUES (uuid(), now(), 23.5, 65.2);
-- Query recent readings (last hour)
SELECT * FROM sensor_data
WHERE sensor_id = 550e8400-e29b-41d4-a716-446655440000
AND reading_time > minTimeuuid('2024-01-01 12:00:00+0000')
AND reading_time < maxTimeuuid('2024-01-01 13:00:00+0000');
UUID utility functions:
cql
-- Convert TimeUUID to timestamp
SELECT dateOf(now()) FROM system.local;
-- Convert timestamp to TimeUUID (min/max)
SELECT minTimeuuid('2024-01-01 00:00:00+0000') FROM system.local;
SELECT maxTimeuuid('2024-01-01 23:59:59+0000') FROM system.local;
-- Extract timestamp from TimeUUID
SELECT dateOf(50554d6e-29bb-11e5-b345-feff819cdc9f) FROM system.local;
-- Convert TimeUUID to Unix timestamp
SELECT unixTimestampOf(now()) FROM system.local;
Table Design Patterns
UUID as partition key:
cql
-- Good: UUID provides even distribution
CREATE TABLE user_profiles (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT,
profile_data TEXT,
created_at TIMESTAMP
);
-- Index on secondary attributes
CREATE INDEX ON user_profiles (username);
CREATE INDEX ON user_profiles (email);
Composite keys with UUID:
cql
-- User sessions with TimeUUID for ordering
CREATE TABLE user_sessions (
user_id UUID,
session_id TIMEUUID,
ip_address INET,
user_agent TEXT,
login_time TIMESTAMP,
PRIMARY KEY (user_id, session_id)
) WITH CLUSTERING ORDER BY (session_id DESC);
-- Multi-tenant application
CREATE TABLE tenant_data (
tenant_id UUID,
data_id UUID,
data_type TEXT,
content TEXT,
created_at TIMEUUID,
PRIMARY KEY (tenant_id, data_id)
);
Time-bucketed data:
cql
-- Bucketed time-series for better performance
CREATE TABLE metrics_by_day (
metric_name TEXT,
day DATE,
metric_time TIMEUUID,
value DOUBLE,
PRIMARY KEY ((metric_name, day), metric_time)
) WITH CLUSTERING ORDER BY (metric_time ASC);
-- Insert with current day bucket
INSERT INTO metrics_by_day (metric_name, day, metric_time, value)
VALUES ('cpu_usage', '2024-01-01', now(), 85.5);
Performance Optimization
UUID distribution and hotspots:
cql
-- Good: Random UUID partition keys distribute evenly
CREATE TABLE distributed_data (
id UUID PRIMARY KEY,
data TEXT
);
-- Avoid: Sequential values create hotspots
-- Don't use TimeUUID as partition key for high-write scenarios
-- Better: Combine UUID with bucketing
CREATE TABLE time_bucketed (
bucket TEXT, -- e.g., '2024-01-01'
id TIMEUUID,
data TEXT,
PRIMARY KEY (bucket, id)
);
Indexing strategies:
cql
-- Secondary indexes on UUID columns
CREATE INDEX user_profile_idx ON user_profiles (user_id);
-- Materialized views with UUID
CREATE MATERIALIZED VIEW users_by_email AS
SELECT user_id, username, email, created_at
FROM user_profiles
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);
Driver Integration
Java Driver:
java
import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.PreparedStatement;
import com.datastax.oss.driver.api.core.cql.ResultSet;
import com.datastax.oss.driver.api.core.uuid.Uuids;
import java.util.UUID;
public class CassandraUUIDExample {
private CqlSession session;
public void insertUser(String name, String email) {
UUID userId = Uuids.random(); // Generate UUID v4
UUID timeId = Uuids.timeBased(); // Generate TimeUUID v1
PreparedStatement prepared = session.prepare(
"INSERT INTO users (id, name, email, created_at) VALUES (?, ?, ?, ?)"
);
session.execute(prepared.bind(userId, name, email, timeId));
}
public void queryRecentEvents(UUID userId) {
UUID oneHourAgo = Uuids.startOf(System.currentTimeMillis() - 3600000);
PreparedStatement prepared = session.prepare(
"SELECT * FROM events WHERE user_id = ? AND event_id > ?"
);
ResultSet results = session.execute(prepared.bind(userId, oneHourAgo));
// Process results...
}
}
Python Driver:
python
from cassandra.cluster import Cluster
from cassandra.util import uuid_from_time
import uuid
import time
cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')
# Insert with UUID
user_id = uuid.uuid4()
time_id = uuid.uuid1()
session.execute("""
INSERT INTO users (id, name, email, created_at)
VALUES (%s, %s, %s, %s)
""", (user_id, 'Jane Doe', '[email protected]', time_id))
# Query with time range
one_hour_ago = uuid_from_time(time.time() - 3600)
rows = session.execute("""
SELECT * FROM events
WHERE user_id = %s AND event_id > %s
""", (user_id, one_hour_ago))
for row in rows:
print(f"Event: {row.event_type} at {row.event_id}")
Node.js Driver:
javascript
const cassandra = require('cassandra-driver');
const { types } = cassandra;
const client = new cassandra.Client({
contactPoints: ['127.0.0.1'],
localDataCenter: 'datacenter1',
keyspace: 'mykeyspace'
});
async function insertUser(name, email) {
const userId = types.Uuid.random();
const timeId = types.TimeUuid.now();
const query = `
INSERT INTO users (id, name, email, created_at)
VALUES (?, ?, ?, ?)
`;
await client.execute(query, [userId, name, email, timeId]);
}
async function getRecentEvents(userId) {
const oneHourAgo = types.TimeUuid.fromDate(new Date(Date.now() - 3600000));
const query = `
SELECT * FROM events
WHERE user_id = ? AND event_id > ?
`;
const result = await client.execute(query, [userId, oneHourAgo]);
return result.rows;
}
Clustering and Replication
Multi-datacenter UUID considerations:
cql
-- UUIDs work perfectly across datacenters
CREATE KEYSPACE global_app WITH replication = {
'class': 'NetworkTopologyStrategy',
'dc1': 3,
'dc2': 3
};
-- Global user table
CREATE TABLE global_app.users (
id UUID PRIMARY KEY,
name TEXT,
email TEXT,
datacenter TEXT,
created_at TIMEUUID
);
-- Insert with datacenter info
INSERT INTO global_app.users (id, name, email, datacenter, created_at)
VALUES (uuid(), 'Global User', '[email protected]', 'dc1', now());
Consistency levels with UUID:
cql
-- Strong consistency for critical UUID operations
CONSISTENCY QUORUM;
INSERT INTO users (id, name, email) VALUES (uuid(), 'Important User', '[email protected]');
-- Eventual consistency for high-throughput UUID inserts
CONSISTENCY ONE;
INSERT INTO events (user_id, event_id, event_type) VALUES (uuid(), now(), 'page_view');
Time-Series Patterns
IoT sensor data:
cql
CREATE TABLE iot_readings (
device_id UUID,
reading_time TIMEUUID,
sensor_type TEXT,
value DOUBLE,
unit TEXT,
PRIMARY KEY (device_id, reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC)
AND compaction = {'class': 'TimeWindowCompactionStrategy'};
-- Insert sensor reading
INSERT INTO iot_readings (device_id, reading_time, sensor_type, value, unit)
VALUES (uuid(), now(), 'temperature', 23.5, 'celsius');
-- Query last 24 hours
SELECT * FROM iot_readings
WHERE device_id = 550e8400-e29b-41d4-a716-446655440000
AND reading_time > minTimeuuid('2024-01-01 00:00:00+0000');
Event sourcing:
cql
CREATE TABLE event_store (
aggregate_id UUID,
event_id TIMEUUID,
event_type TEXT,
event_data TEXT,
version INT,
PRIMARY KEY (aggregate_id, event_id)
) WITH CLUSTERING ORDER BY (event_id ASC);
-- Append event
INSERT INTO event_store (aggregate_id, event_id, event_type, event_data, version)
VALUES (uuid(), now(), 'UserCreated', '{"name": "John", "email": "[email protected]"}', 1);
-- Replay events for aggregate
SELECT * FROM event_store
WHERE aggregate_id = 550e8400-e29b-41d4-a716-446655440000
ORDER BY event_id ASC;
Migration and Maintenance
Adding UUID columns to existing tables:
cql
-- Add UUID column
ALTER TABLE existing_table ADD uuid_id UUID;
-- Populate with UUIDs (requires application logic)
-- UPDATE existing_table SET uuid_id = uuid() WHERE id = ?;
-- Create index on new UUID column
CREATE INDEX ON existing_table (uuid_id);
Data migration with UUIDs:
cql
-- Source table
CREATE TABLE old_users (
id INT PRIMARY KEY,
name TEXT,
email TEXT
);
-- Target table with UUID
CREATE TABLE new_users (
id UUID PRIMARY KEY,
old_id INT,
name TEXT,
email TEXT,
migrated_at TIMEUUID
);
-- Migration query (run from application)
-- INSERT INTO new_users (id, old_id, name, email, migrated_at)
-- SELECT uuid(), id, name, email, now() FROM old_users;
Monitoring and Troubleshooting
UUID-related queries for monitoring:
cql
-- Check table with UUID primary keys
SELECT COUNT(*) FROM users;
-- Analyze UUID distribution (requires application-level analysis)
SELECT token(id), id FROM users LIMIT 100;
-- TimeUUID range queries
SELECT COUNT(*) FROM events
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
AND event_id > minTimeuuid('2024-01-01 00:00:00+0000')
AND event_id < maxTimeuuid('2024-01-01 23:59:59+0000');
UUID Version Comparison
Choose the right UUID type for your Cassandra application:
- UUID (v4) - Random, excellent for partition keys and general use
- TimeUUID (v1) - Time-based, perfect for clustering keys and time-series data
- Version 3 - MD5 namespace-based (application-generated)
- Version 5 - SHA-1 namespace-based (application-generated)
- Version 6 - Time-ordered (application-generated)
- Version 7 - Modern time-based (application-generated)
For Cassandra applications:
- Partition keys: Use UUID (v4) for even distribution
- Clustering keys: Use TimeUUID (v1) for time-ordered data
- Time-series: Use TimeUUID for chronological queries
- IoT applications: Use TimeUUID for sensor data ordering
How do I generate UUID in other databases?
SQL databases:
- PostgreSQL - Comprehensive UUID support with extensions
- MySQL - Built-in UUID() function
- SQL Server - NEWID() and NEWSEQUENTIALID()
- SQLite - Custom functions and extensions
- Oracle - SYS_GUID() function
NoSQL databases: