Unions

    A Union in Cyrus is a low-level composite type where all member fields share the exact same base memory address. Unlike a struct, which allocates distinct offsets for each field, a union provides a way to interpret a single block of raw memory as multiple different types.

    The size of a union is determined by the size of its largest member. Writing to any field overlaps and overwrites the memory occupied by all other fields, effectively providing a mechanism for memory aliasing.

    Defining a Union

    union DataUnion {
        a: int;
        b: float64;
    }
    
    fn main() {
        var raw: DataUnion;
    
        raw.b = 3.14;
    }
    

    Using a Union

    You can create a union instance and assign fields directly:

    fn main() {
        var raw = DataUnion;
    
        raw.a = 42;       // set the integer field
        raw.b = 3.14;     // overwrites the same memory with a float
    }
    

    After raw.b = 3.14, the value of raw.a is no longer valid.

    Union Initialization

    Unions can be initialized using a Union Initializer, specifying which field to set at creation:

    var un: DataUnion = DataUnion { a: 10 };
    

    Rules:

    • Only one field should be initialized.
    • The union's memory will be set according to that field.

    Practical Use Cases

    Unions are low-level tools, mainly used in systems programming:

    • Type punning: reinterpret the same memory as different types.
    • Interfacing with C libraries: many C APIs expose unions in their structs.
    • Memory efficiency: when you know only one of several large fields will be used at once.

    Example: Interpreting the same 32-bit data as either an integer or raw bytes.

    import std::libc{printf};
    
    union IntBytes {
        value: int;
        bytes: uint8[4];
    }
    
    fn main() {
        var data = IntBytes { value: 0x12345678 };
    
        printf("%x %x %x %x\n", data.bytes[0], data.bytes[1], data.bytes[2], data.bytes[3]);
    }
    

    Output (on little-endian systems):

    78 56 34 12
    

    Unnamed Union Initialization

    Similar to structs, you can use unnamed unions for inline data layout or initialization of named union types. This is particularly useful for temporary low-level buffers.

    union Payload {
        i: int64;
        s: char*;
    }
    
    pub fn main() {
        const layout: Payload = union { s: "Cyrus!" };
    
        printf("%s\n", layout.s);
    }
    

    Only one field can be initialized in a union value. Providing multiple fields will result in a compile-time error.

    Union Pointer Aliasing

    Since every field in a union shares the same base address, taking a reference to a specific field provides a typed pointer to the union's shared memory block. This allows for pointer aliasing, where you can manipulate the union's raw data through pointers of different types.

    This is a powerful feature for systems programming, enabling direct memory manipulation without explicit casting at every step.

    union DataStore {
        p: char*;
        i: int64;
    }
    
    pub fn main() {
        // Initialize the union via the pointer field
        var inst = DataStore { p: null };
    
        // Obtain a pointer to the integer field
        // Both &inst.p and &inst.i point to the same memory address
        var iptr: int64* = &inst.i;
    
        // Indirectly modify the union memory through the aliased pointer
        *iptr = 2500;
    
        // The shared memory now holds the bit pattern of the integer 2500
        printf("%d\n", inst.i);
    }