BLAKE3

BLAKE3 is a modern cryptographic hash function that offers exceptional performance while maintaining strong security properties. It’s one of the newest additions to cryptopp-modern.

Overview

BLAKE3 is based on the BLAKE2 hash function and Bao tree hashing mode. Key features include:

  • Fast: Significantly faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2
  • Secure: Based on proven cryptographic primitives
  • Parallelisable: Takes advantage of SIMD instructions
  • General-purpose: One algorithm for hashing, MACs, KDFs, and PRNGs
  • Simple: Fewer parameters and modes than BLAKE2
  • SIMD Acceleration: Runtime CPU detection with AVX2, SSE4.1, and C++ fallback

Use Cases

BLAKE3 is ideal for:

  • General-purpose hashing
  • File integrity verification
  • Content-addressed storage systems
  • Key derivation (from high-entropy secrets, not passwords)
  • Message authentication codes (MACs)
⚠️
Never use BLAKE3 for password hashing. Fast hash functions allow attackers to try billions of guesses per second. Use Argon2id instead - it’s deliberately slow and memory-hard.

SIMD Acceleration

cryptopp-modern’s BLAKE3 implementation includes SIMD-accelerated code paths with automatic runtime CPU detection:

SIMD LevelParallel ChunksMinimum BufferApprox. Speed
AVX28 at a time8KB~2600 MiB/s
SSE4.14 at a time4KB~1800 MiB/s
C++1 at a timeAny~800 MiB/s

Benchmarks on Intel Core Ultra 7 155H, Windows 11, MinGW-w64 GCC

Checking the Active Provider

You can verify which SIMD implementation is being used at runtime:

#include <cryptopp/blake3.h>
#include <iostream>

int main() {
    CryptoPP::BLAKE3 hash;
    std::cout << "BLAKE3 provider: " << hash.AlgorithmProvider() << std::endl;
    // Output: "AVX2", "SSE4.1", "NEON", or "C++"
    return 0;
}

Optimising for SIMD Performance

BLAKE3’s speed advantage comes from its ability to process multiple 1KB chunks in parallel. To achieve maximum throughput, pass data in large buffers:

#include <cryptopp/blake3.h>
#include <fstream>
#include <vector>

int main() {
    CryptoPP::BLAKE3 hash;
    CryptoPP::byte digest[CryptoPP::BLAKE3::DIGESTSIZE];

    // Use 64KB buffer for optimal AVX2 performance
    const size_t BUFFER_SIZE = 65536;
    std::vector<CryptoPP::byte> buffer(BUFFER_SIZE);

    std::ifstream file("largefile.bin", std::ios::binary);
    while (file.read(reinterpret_cast<char*>(buffer.data()), BUFFER_SIZE)) {
        hash.Update(buffer.data(), file.gcount());
    }
    if (file.gcount() > 0) {
        hash.Update(buffer.data(), file.gcount());
    }

    hash.Final(digest);
    return 0;
}
ℹ️
Buffer size matters for large data. With AVX2, 8KB+ buffers enable 8-way parallel chunk processing (~2600 MiB/s). Smaller buffers still work correctly but won’t achieve maximum throughput.

Graceful Degradation

BLAKE3 automatically adapts to available data - no special handling required:

Data SizeParallelismTypical Use Case
< 1KBNone (single chunk)Short identifiers, tokens, small strings
1-4KBLimitedSmall files, config data
4-8KBSSE4.1 (4-way)Medium files
8KB+AVX2 (8-way)Large files, disk images

Small data hashing works correctly and efficiently - the parallel processing is a bonus for large data, not a requirement.

Basic Usage

Simple Hashing

#include <cryptopp/blake3.h>
#include <cryptopp/hex.h>
#include <cryptopp/filters.h>
#include <iostream>
#include <string>

int main() {
    CryptoPP::BLAKE3 hash;
    std::string message = "The quick brown fox jumps over the lazy dog";
    std::string digest;

    CryptoPP::StringSource(message, true,
        new CryptoPP::HashFilter(hash,
            new CryptoPP::HexEncoder(
                new CryptoPP::StringSink(digest))));

    std::cout << "Message: " << message << std::endl;
    std::cout << "BLAKE3 hash: " << digest << std::endl;

    return 0;
}

Hashing a File

#include <cryptopp/blake3.h>
#include <cryptopp/hex.h>
#include <cryptopp/files.h>
#include <iostream>

int main() {
    CryptoPP::BLAKE3 hash;
    std::string digest;

    try {
        CryptoPP::FileSource("myfile.txt", true,
            new CryptoPP::HashFilter(hash,
                new CryptoPP::HexEncoder(
                    new CryptoPP::StringSink(digest))));

        std::cout << "BLAKE3 hash: " << digest << std::endl;
    }
    catch (const CryptoPP::Exception& ex) {
        std::cerr << "Error: " << ex.what() << std::endl;
        return 1;
    }

    return 0;
}

Incremental Hashing

For large data or streaming scenarios:

#include <cryptopp/blake3.h>
#include <cryptopp/hex.h>
#include <iostream>

int main() {
    CryptoPP::BLAKE3 hash;
    std::string digest;

    // Update hash incrementally
    hash.Update((const CryptoPP::byte*)"First part ", 11);
    hash.Update((const CryptoPP::byte*)"Second part ", 12);
    hash.Update((const CryptoPP::byte*)"Third part", 10);

    // Finalize and get result
    digest.resize(hash.DigestSize());
    hash.Final((CryptoPP::byte*)&digest[0]);

    // Convert to hex
    std::string hexDigest;
    CryptoPP::StringSource(digest, true,
        new CryptoPP::HexEncoder(
            new CryptoPP::StringSink(hexDigest)));

    std::cout << "BLAKE3 hash: " << hexDigest << std::endl;

    return 0;
}

Keyed Hashing (MAC)

BLAKE3 can be used as a message authentication code:

#include <cryptopp/blake3.h>
#include <cryptopp/hex.h>
#include <cryptopp/osrng.h>
#include <cryptopp/secblock.h>
#include <iostream>

int main() {
    // Generate a random 32-byte key
    CryptoPP::AutoSeededRandomPool rng;
    CryptoPP::SecByteBlock key(32);
    rng.GenerateBlock(key, key.size());

    // Create keyed BLAKE3 (MAC mode) via constructor
    CryptoPP::BLAKE3 mac(key, key.size());

    std::string message = "Authenticated message";
    std::string macHex;

    CryptoPP::StringSource(message, true,
        new CryptoPP::HashFilter(mac,
            new CryptoPP::HexEncoder(
                new CryptoPP::StringSink(macHex))));

    std::cout << "BLAKE3 MAC: " << macHex << std::endl;

    return 0;
}

Key Derivation

BLAKE3 includes a dedicated key derivation mode with context strings for domain separation:

#include <cryptopp/blake3.h>
#include <cryptopp/secblock.h>
#include <iostream>

int main() {
    // Input keying material (e.g., from Argon2 output or key exchange)
    CryptoPP::SecByteBlock ikm(32);
    // In practice, this comes from your key exchange or master secret
    memset(ikm, 0x55, ikm.size());

    // Context string should be unique to your application
    std::string context = "MyApp 2025-01-01 encryption key";

    // KDF mode via context constructor
    CryptoPP::BLAKE3 kdf(context.c_str());

    // Derive a 32-byte key
    CryptoPP::SecByteBlock derivedKey(32);
    kdf.Update(ikm, ikm.size());
    kdf.TruncatedFinal(derivedKey, derivedKey.size());

    std::cout << "Derived key successfully" << std::endl;
    return 0;
}
ℹ️
Context strings provide domain separation. Using different context strings ensures that keys derived for different purposes are cryptographically independent, even from the same input keying material.

Extendable Output (XOF)

BLAKE3 supports extendable output, allowing you to generate any amount of output bytes:

#include <cryptopp/blake3.h>
#include <cryptopp/hex.h>
#include <iostream>

int main() {
    CryptoPP::BLAKE3 hash;
    hash.Update((const CryptoPP::byte*)"input data", 10);

    // Get 64 bytes of output instead of the default 32
    CryptoPP::byte output[64];
    hash.TruncatedFinal(output, sizeof(output));

    std::string hexOutput;
    CryptoPP::StringSource(output, sizeof(output), true,
        new CryptoPP::HexEncoder(
            new CryptoPP::StringSink(hexOutput)));

    std::cout << "64-byte XOF output: " << hexOutput << std::endl;
    return 0;
}

Performance Characteristics

BLAKE3 performance advantages:

  • SIMD Acceleration: Automatic runtime detection of AVX2, SSE4.1, or C++ fallback
  • Parallelism: Merkle tree structure enables parallel chunk processing
  • Small inputs: Still fast even for small messages
  • Large files: Excellent performance on large data with proper buffer sizes

Benchmarks

AlgorithmProviderSpeed (MiB/s)vs BLAKE2b
BLAKE3AVX225993.15x faster
BLAKE3SSE4.1~18002.2x faster
BLAKE3C++~800Similar
BLAKE2bSSE4.1822baseline

Benchmarks on Intel Core Ultra 7 155H, Windows 11, MinGW-w64 GCC

Comparison with Other Hash Functions

FeatureBLAKE3SHA-256SHA3-256BLAKE2b
SpeedFastestMediumSlowerFast
Security128-bit128-bit128-bit128-bit
ParallelisableYes (SIMD)NoNoLimited
SimplicityExcellentGoodGoodGood
StandardisedNoYes (FIPS)Yes (FIPS)Yes (RFC)

When to Use BLAKE3

Choose BLAKE3 when:

  • Performance is critical
  • You need a modern, fast hash function
  • Parallel processing is available
  • You want one algorithm for multiple purposes

Consider alternatives when:

  • FIPS 140-2 compliance is required (use SHA-2 or SHA-3)
  • Regulatory requirements mandate specific algorithms
  • Maximum compatibility with legacy systems is needed

Security Considerations

  • Collision resistance: ~128-bit
  • Preimage resistance: ~128-bit
  • Second preimage resistance: ~128-bit
  • No known practical attacks as of 2025

BLAKE3 uses a 256-bit output, but its design targets ~128-bit security for all standard security goals, which is appropriate for most modern applications.

BLAKE3 isn’t post-quantum-special; like other 256-bit hashes, generic quantum attacks (Grover’s algorithm) would roughly halve its effective security. Plan for ~128-bit classical / ~64-bit quantum security.

API Reference

class BLAKE3 : public HashTransformation {
public:
    CRYPTOPP_CONSTANT(DIGESTSIZE = 32)

    // Standard hash mode
    BLAKE3(unsigned int digestSize = DIGESTSIZE);

    // Keyed hash mode (MAC)
    BLAKE3(const byte* key, size_t keyLength, unsigned int digestSize = DIGESTSIZE);

    // KDF mode with context string
    BLAKE3(const char* context, unsigned int digestSize = DIGESTSIZE);

    void Update(const byte *input, size_t length);
    void Final(byte *digest);
    void TruncatedFinal(byte *digest, size_t digestSize);  // XOF mode
    void Restart();

    unsigned int DigestSize() const { return DIGESTSIZE; }
    unsigned int BlockSize() const { return 64; }

    // Returns "AVX2", "SSE4.1", "NEON", or "C++"
    std::string AlgorithmProvider() const;
};

BLAKE3 also supports SetKey() and SetKeyWithContext() methods for MAC and KDF modes; these are equivalent to the keyed and context constructors above.

Building with BLAKE3

BLAKE3 is included by default in cryptopp-modern 2025.11.0 and later.

SIMD Build Options

The SIMD implementations are automatically enabled based on compiler and platform support:

CompilerAVX2 FlagSSE4.1 Flag
GCC/Clang-mavx2-msse4.1
MSVC/arch:AVX2Enabled by default on x64

For CMake builds:

cmake -DCRYPTOPP_AVX2=ON -DCRYPTOPP_SSE41=ON ..

For GNUmakefile builds, SIMD is auto-detected based on CPU capabilities.

Compiling Your Application

Include the header:

#include <cryptopp/blake3.h>

Compile and link:

# Linux/macOS
g++ -std=c++11 myapp.cpp -o myapp -lcryptopp

# Windows (MinGW)
g++ -std=c++11 myapp.cpp -o myapp.exe -lcryptopp

Test Vectors

BLAKE3 test vectors from the official specification:

InputLengthBLAKE3 Hash (hex)
"" (empty)0af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262
“abc”36437b3ac38465133ffb63b75273a8db548c558465d79db03fd359c6cd5bd9d85

Implementation Notes

Parallel Processing Architecture

BLAKE3 uses a Merkle tree structure that enables parallel processing of 1KB chunks:

Input Data (16KB example)
├── Chunk 0 (1KB) ─┬─> Hash4_SSE41 ─┬─> CV 0
├── Chunk 1 (1KB) ─┤  (4 parallel)  ├─> CV 1
├── Chunk 2 (1KB) ─┤                ├─> CV 2
├── Chunk 3 (1KB) ─┘                └─> CV 3
├── Chunk 4 (1KB) ─┬─> Hash4_SSE41 ─┬─> CV 4
├── ...            │                │
└── Chunk 15 (1KB) ─┘               └─> CV 15
                                         │
                                         v
                                   Parent hashing
                                         │
                                         v
                                   Final digest

With AVX2, 8 chunks are processed simultaneously, doubling throughput compared to SSE4.1.

Thread Safety

Each BLAKE3 object maintains independent state. Multiple threads can safely use separate instances. A single instance should not be shared across threads without synchronisation.

Memory Alignment

For optimal SIMD performance, input buffers aligned to 32 bytes (AVX2) or 16 bytes (SSE4.1) may provide marginal improvements, though unaligned access is fully supported.

Further Reading