Contents

Embedding Files in C/C++ Programs

Background

Recently, I came across a post on X by @0xTriboulet asking how to deal with large header files in Visual Studio projects https://x.com/0xTriboulet/status/1878139439714558169.

/posts/embedding-files-in-c-cpp-programs/images/x-post.png

intelligence intellisense

Based on this post and the rest of the thread, I assume that they were attempting to insert the binary data from a file in their program by converting it into a large C byte array and then pasting that array into a header file.

This is a very common method of embedding binary data from a file inside a C/C++ project. The file data gets converted to hex or decimal and wrapped inside a C array.

#ifndef MYFILE_HEADER_H
#define MYFILE_HEADER_H

const unsigned char MYFILE_DATA[] = { 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, ... };

#endif // MYFILE_HEADER_H

Although this does work, it can lead to some issues with code analysis tools, like auto completion, when they attempt to process this file.

An alternative method of embedding files in a C/C++ program is by skipping the source code part altogether and instead convert the needed file directly into an object file that can be linked in at build time.


TL; DR

GNU/MinGW ld can generate an object file from arbitrary binary data using the -r and -b binary options. This object file will export a few symbols named _binary_FILENAME_start, _binary_FILENAME_end, and _binary_FILENAME_size which can be used to reference the embedded data.

More information can be found in this StackOverflow thread.

A script or other tools can also be used to generate object files with other files embedded into it.


Generating COFFs with ld

This method only1 works with GNU/MinGW ld and LLVM lld. These linkers support different targets which determine what type of object files they are capable of generating. The target formats supported by both GNU/MinGW ld and LLVM lld are listed in the --help output of each program.

# GNU ld
ld --help | grep 'supported targets'
ld: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 pei-i386 pe-x86-64 pei-x86-64 elf64-little elf64-big elf32-little elf32-big pe-bigobj-x86-64 pe-i386 pdb elf64-bpfle elf64-bpfbe srec symbolsrec verilog tekhex binary ihex plugin
# LLVM lld
ld.lld --help | grep 'supported targets'
ld.lld: supported targets: elf

The target format can be specified using the --oformat flag on the command line. These linkers also support another flag, -r, which specifies that they should generate a relocatable object file as output instead of a typical executable or shared library. The -b option is used to specify the type of input file.

Here is how a COFF can be generated using ld with the contents of an arbitrary file embedded inside it.

matt@laptop :: ~ >> cat hello.txt
Hello World
matt@laptop :: ~ >> ld -r --oformat pe-x86-64 -b binary -o hello.o hello.txt
matt@laptop :: ~ >> file hello.o
hello.o: Intel amd64 COFF object file, no relocation info, no line number info, not stripped, 1 section, symbol offset=0x4c, 3 symbols, 1st section name ".data"
matt@laptop :: ~ >>

MinGW ld can also do the same but without needing the --oformat since it produces COFFs by default.

Along with the embedded file data, the COFF will also contain a set of symbols that can be used for referencing that data. These symbols can be viewed using any generic tool that is capable of displaying the symbol table of a COFF. One such tool is rabin2.

matt@laptop :: ~ >> rabin2 -s hello.o
[Symbols]
nth paddr      vaddr      bind   type size lib name                    demangled
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x0000003c 0x00000000 GLOBAL UNK  4        _binary_hello_txt_start
0   0x00000000 0x00000000 GLOBAL UNK  4        _binary_hello_txt_size
0   0x00000048 0x0000000c GLOBAL UNK  4        _binary_hello_txt_end

The symbol names are derived from the name of the input file passed into ld.


Using ld Generated COFFs

Now in the C/C++ program, the file data can be referenced through those exported symbols.

#include <stdio.h>

extern unsigned char _binary_hello_txt_start[];
extern unsigned char _binary_hello_txt_end;
extern unsigned char _binary_hello_txt_size;

int main(void) {
    printf("Array: %p\n", _binary_hello_txt_start);
    printf("Array end: %p\n", &_binary_hello_txt_end);
    printf("Array size: %llu\n", &_binary_hello_txt_size);
    return 0;
}

There are three important things worth mentioning here:

  1. The file data is not guaranteed to be NULL terminated. If the data is being used as a string, a NULL terminator should be manually inserted to ensure that it is present. ld will pad the section with NULL bytes to match the specified alignment (16 bytes) but will not add this padding if it is not needed. The NULL terminator can be added either before the input file is passed through ld or during post-processing.
  2. The _binary_hello_txt_end symbol is the first byte past the end of the file data. A & is needed to get the address of it.
  3. The _binary_hello_txt_size symbol is a little weird. The actual size of the file data is the address of the symbol and not the symbol itself. This is why the & is needed to get the array size. It may be easier to create a macro that derives the size of the file data based on the start and end addresses of the array.
#include <stdio.h>

extern unsigned char _binary_hello_txt_start[];
extern unsigned char _binary_hello_txt_end;

#define hello_txt_size (&_binary_hello_txt_end - _binary_hello_txt_start)

int main(void) {
    printf("Array size: %llu\n", hello_txt_size);
    return 0;
}

The object file with the embedded file data can be linked in as is without needing to add any special link flags.

x86_64-w64-mingw32-gcc -o main.exe main.c hello.o

# Or with clang
clang --target=x86_64-windows-gnu -o main.exe main.c hello.o

This method of embedding arbitrary file data into a C/C++ program is a lot nicer since it does not require any extra developer tooling and eliminates the need to generate a source file for the embedded data.


Object Post-Processing

Some post-processing may be needed in order to get the object file in a more desirable state. This can be done with objcopy or various other custom tools.

Here are some example scenarios where objcopy can be used to make these modifications.

Renaming the Exported Symbols

The auto-generated symbol names can be renamed using the --redefine-sym flag.

objcopy \
	--redefine-sym _binary_hello_txt_start=hello_data_start \
	--redefine-sym _binary_hello_txt_end=hello_data_end \
	--redefine-sym _binary_hello_txt_size=hello_data_size \
	hello.o

Removing the _size Symbol

If the *_size symbol or any other symbol is not needed, it can be removed from the COFF.

objcopy -N _binary_hello_txt_size hello.o

Adding a Trailing NULL Byte

A trailing NULL byte can be added at the end of the file data if it is being used as a string. This is done by creating a temporary file with the extracted section data, adding the trailing NULL byte to it, and then reinserting it back into the COFF.

export SECTION_TMPFILE=$(mktemp -t -p /tmp objcopy-section.XXXXX)
objcopy --dump-section .data=$SECTION_TMPFILE hello.o
printf '\0' >> $SECTION_TMPFILE
objcopy --update-section .data=$SECTION_TMPFILE hello.o
rm $SECTION_TMPFILE
unset SECTION_TMPFILE

Changing the Section of the File Data

ld by default will insert the file into the .data section. This can be changed to a different section if desired.

objcopy --rename-section .data=.rdata hello.o

The section flags will automatically be adjusted if the new section name is a standard section name. If the section flags need to be manually adjusted for custom sections, they can be specified using the --set-section-flags option or in the --rename-section option.

objcopy --rename-section .data=custom hello.o
objcopy --set-section-flags custom=alloc,load,readonly,data,contents hello.o

# Changing the flags during renaming
objcopy --rename-section .data=custom,alloc,load,readonly,data,contents hello.o

The flag values are listed under the --set-section-flags option in the objcopy man page.


Generating COFFs from Scratch

ld and objcopy provide a ton of flexibility for converting arbitrary binary data files into linkable object files. One of the disadvantages is that this file embedding workflow is not ideal for Windows environments. MSVC’s link.exe is unable to generate an object file from an arbitrary binary input file and LLVM lld only supports generating ELFs2. The Windows version of LLVM lld bundled with Visual Studio also does not support the -r or -b flags.

There are many existing tools publicly available that work on Windows and can generate COFFs from an arbitrary binary file. These can be found by searching for “bin2obj” or “bin2coff” programs online.

Writing a custom tool for this can provide a lot more flexibility than what existing implementations may offer.

It may seem a little complicated at first; however, COFFs are a pretty straightforward file format and the lack of relocations makes things slightly easier.

The “Writing Beacon Object Files Without DFR” blog post starting from the “So How Does This Work?” section contains an in-depth walk through on the COFF file structure which provides some good background on this.

Looking at the COFF Generated by ld

Analyzing the COFFs generated by ld can help with understanding what is all involved to accomplish this.

This will create a basic test COFF for exploring.

echo "Hello World" > hello.txt
ld -r --oformat pe-x86-64 -b binary -o hello.o hello.txt

Here is the hex-dump of the generated COFF.

00000000: 6486 0100 0000 0000 4c00 0000 0300 0000  d.......L.......
00000010: 0000 0500 2e64 6174 6100 0000 0000 0000  .....data.......
00000020: 0000 0000 1000 0000 3c00 0000 0000 0000  ........<.......
00000030: 0000 0000 0000 0000 4000 50c0 4865 6c6c  [email protected]
00000040: 6f20 576f 726c 640a 0000 0000 0000 0000  o World.........
00000050: 0400 0000 0000 0000 0100 0000 0200 0000  ................
00000060: 0000 1c00 0000 0c00 0000 ffff 0000 0200  ................
00000070: 0000 0000 3300 0000 0c00 0000 0100 0000  ....3...........
00000080: 0200 4900 0000 5f62 696e 6172 795f 6865  ..I..._binary_he
00000090: 6c6c 6f5f 7478 745f 7374 6172 7400 5f62  llo_txt_start._b
000000a0: 696e 6172 795f 6865 6c6c 6f5f 7478 745f  inary_hello_txt_
000000b0: 7369 7a65 005f 6269 6e61 7279 5f68 656c  size._binary_hel
000000c0: 6c6f 5f74 7874 5f65 6e64 00              lo_txt_end.

It is pretty small with the majority of the file consisting of the COFF’s metadata.

Ghidra, or any other disassembler, can also help with further analyzing it.

/posts/embedding-files-in-c-cpp-programs/images/ghidra-coff.png

This listing display contains the entire COFF. There is a COFF File Header, a single Section Header for the .data section and the file contents inside the .data section.

The symbol table (Window -> Symbol Table) lists all of the defined symbols. /posts/embedding-files-in-c-cpp-programs/images/ghidra-coff-symbols.png

These are the symbols that are automatically generated and exported for use inside the main program. The “Source” column in Ghidra shows them as being “Imported” but that just means they are defined with IMAGE_SYM_CLASS_EXTERNAL storage class.

The relocations (Window -> Relocation Table) window will show 0 relocations because there are no references in this COFF that reference external data or data in other sections.

This COFF information can be laid out linearly as it is in the file but with the values for each structure filled in.

/** COFF File header */
struct COFFFileHeader {
    Machine = IMAGE_FILE_MACHINE_AMD64, /* 0x8664 */
    NumberOfSections = 1,
    TimeDateStamp = 0,
    PointerToSymbolTable = 0x4c,
    NumberOfSymbols = 3,
    SizeOfOptionalHeader = 0,
    Characteristics = IMAGE_FILE_RELOCS_STRIPPED | IMAGE_FILE_LINE_NUMS_STRIPPED, /* 0x5 */
};

/** Section Table (Section Headers) */
/* Only 1 section header in the section table */
struct COFFSectionHeader {
    Name = ".data",
    VirtualSize = 0,
    VirtualAddress = 0,
    SizeOfRawData = 0x10,
    PointerToRawData = 0x3c,
    PointerToRelocations = 0,
    PointerToLineNumbers = 0,
    NumberOfRelocations = 0,
    NumberOfLineNumbers = 0,
    Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_ALIGN_16BYTES | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE, /* 0xc0500040 */
};

/** The embedded file's contents are located here after the section header and padded with NULL bytes */
"Hello World"

/** COFF Symbol Table (3 symbols) */
struct COFFSymbol {
    Name = "_binary_hello_txt_start",
    Value = 0,
    SectionNumber = 1,
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL, /* 0x0 */
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL, /* 0x2 */
    NumberOfAuxSymbols = 0,
};

struct COFFSymbol {
    Name = "_binary_hello_txt_size",
    Value = 0xc,
    SectionNumber = IMAGE_SYM_ABSOLUTE, /* -1 */
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL, /* 0x0 */
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL, /* 0x2 */
    NumberOfAuxSymbols = 0,
};

struct COFFSymbol {
    Name = "_binary_hello_txt_end",
    Value = 0xc,
    SectionNumber = 1,
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL, /* 0x0 */
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL, /* 0x2 */
    NumberOfAuxSymbols = 0,
};

/* String table */

One thing missing from this is the COFF’s string table. The string table contains the string values from the Name fields in the COFFSymbol structures. They are stored there because those strings are longer than 8 bytes. More information on how the string table is utilized can also be found in the “So How Does This Work?” section from the “Writing Beacon Object Files Without DFR” post.

In the _binary_hello_txt_size symbol, the value is set to the size of the file data and the symbol’s section value is set to IMAGE_SYM_ABSOLUTE. This is where the weirdness with that symbol mentioned above comes from and why you need to reference the symbol’s address to get the actual value. There is a way to fix that mentioned later in the “Fixing the _size Symbol” section.

The parts that will vary in this metadata are the PointerToSymbolTable field in the COFFFileHeader, the SizeOfRawData field in the COFFSectionHeader, and the Value fields in the _binary_hello_txt_size and _binary_hello_txt_end symbols. They change based on the size of the file being embedded.

There are some patterns with this data. The PointerToSymbolTable field in the COFFFileHeader structure is actually the PointerToRawData field from the COFFSectionHeader structure plus the size of the embedded file itself. The SizeOfRawData field in the COFFSectionHeader is the file size rounded up to the next multiple of the alignment specified in the Characteristics field (IMAGE_SCN_ALIGN_16BYTES 16 bytes). The section data is padded with NULL bytes at the end to reach this size. The Value fields in the _binary_hello_txt_size and _binary_hello_txt_end COFFSymbols are the same and are just the size of the file.

Ultimately, the COFF consists of the structures above laid out in the order below with the file data embedded somewhere in the center.

  • COFF File Header
  • Section header for the .data section
  • Embedded file’s data padded with NULL bytes to reach the specified alignment
  • The COFFSymbol structure array
  • The string table

Fixing the _size Symbol

The way the *_size symbol is defined is a little bit inconvenient to work with when referencing it in a program. A simple way of fixing this is to append the file’s size at the end of the section and set the *_size symbol to reference that value. Another way is to create a new .rdata section with the file size and have the symbol reference that value.

Here is what the former looks like.

Current layout of the file data and symbol table.

/* The section data as a hexdump */
00000000: 4865 6c6c 6f20 576f 726c 640a 0000 0000  Hello World.....

/* The symbol table */
struct COFFSymbol {
    Name = "_binary_hello_txt_start",
    Value = 0,
    SectionNumber = 1,
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
    NumberOfAuxSymbols = 0,
};

struct COFFSymbol {
    Name = "_binary_hello_txt_size",
    Value = 0xc,
    SectionNumber = IMAGE_SYM_ABSOLUTE,
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
    NumberOfAuxSymbols = 0,
};

struct COFFSymbol {
    Name = "_binary_hello_txt_end",
    Value = 0xc,
    SectionNumber = 1,
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
    NumberOfAuxSymbols = 0,
};

New layout with the file size added in.

/* The section data as a hexdump */
00000000: 4865 6c6c 6f20 576f 726c 640a 0000 0000  Hello World.....
00000010: 0c00 0000 0000 0000 0000 0000 0000 0000  ................ // Size of the file (0xc) in little endian and added at the end with some NULL byte padding.

/* The symbol table */
struct COFFSymbol {
    Name = "_binary_hello_txt_start",
    Value = 0,
    SectionNumber = 1,
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
    NumberOfAuxSymbols = 0,
};

struct COFFSymbol {
    Name = "_binary_hello_txt_size",
    Value = 0x10, // Value adjusted to point to the inserted size value in the section
    SectionNumber = 1, // SectionNumber adjusted to reference the section with the size value
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
    NumberOfAuxSymbols = 0,
};

struct COFFSymbol {
    Name = "_binary_hello_txt_end",
    Value = 0xc,
    SectionNumber = 1,
    Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
    StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
    NumberOfAuxSymbols = 0,
};

Now, the size of the file data can be referenced without needing to take the address of it.

#include <stdint.h>
#include <stdio.h>

extern char _binary_hello_txt_start[];
extern char _binary_hello_txt_end;
extern size_t _binary_hello_txt_size;

int main(void) {
    printf("Array size: %llu\n", _binary_hello_txt_size);
    return 0;
}

bin2coff.py

I wrote a small, self-contained python script for generating COFFs that should work on various different platforms (Linux, Windows, Mac, etc.).

https://gist.github.com/MEhrn00/9615b92d9bfd3c85d6cba69edb31387d

Generating COFFs from… yaml?

Some people might say that this method is extremely cursed and should not be a thing that actually exists. Others may marvel at its pristine beauty and grasp the true extent of its full potential.

In the LLVM toolset, there exists a set of two peculiar tools named yaml2obj and obj2yaml.

As the names may imply, these tools are capable of creating object files from a yaml file that describes its contents.

Here is what that looks like. The yaml description of the hello.o file from above can be printed out using obj2yaml.

matt@laptop :: ~ >> obj2yaml hello.o
--- !COFF
header:
  Machine:         IMAGE_FILE_MACHINE_AMD64
  Characteristics: [ IMAGE_FILE_RELOCS_STRIPPED, IMAGE_FILE_LINE_NUMS_STRIPPED ]
sections:
  - Name:            .data
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       16
    SectionData:     48656C6C6F20576F726C640A00000000
    SizeOfRawData:   16
symbols:
  - Name:            _binary_hello_txt_start
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            _binary_hello_txt_size
    Value:           12
    SectionNumber:   -1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            _binary_hello_txt_end
    Value:           12
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
...

This yaml data contains all of the information pertaining to this COFF.

A new COFF can be generated by modifying the values in this yaml file or by creating a new yaml file from scratch and then running it through yaml2obj.

For the purpose of embedding files into C/C++ programs, this is achievable with yaml2obj.

The yaml example file above can be used as a template for embedding another file into it. The section data string will need to be replaced with the hex string of the target file for embedding and the other metadata needs to be adjusted to account for the size of the file data. Then, yaml2obj can take the modified yaml file and generate a fresh object file for linking that contains the embedded file data.

This process can be scripted out using something like bash or python.

#!/bin/bash

if [ "$#" -ne 3 ]; then
    echo "usage: $0 [INPUT] [OUTPUT] [SYMBOL]"
    exit 1
fi

if [ ! -f "$1" ]; then
    echo "$1 is not a file."
    exit 1
fi

if [ -z "$3" ]; then
    echo "Symbol is empty."
    exit 1
fi

filesize=$(ls -l $1 | cut -d' ' -f5)
filehex=$(xxd -p -u -c0 $1)

sectiondata=$filehex
sectionsize=$filesize

alignment=16
remainder=$(($sectionsize % $alignment))
if [ $remainder -ne 0 ]; then
    padding=$(($alignment - $remainder))
    for _ in $(seq $padding); do
        sectiondata+="00"
    done
    sectionsize=$(($sectionsize + $padding))
fi

tmp=$(mktemp -t -p /tmp yaml2obj.XXXXX)
trap '{ rm -f -- "$tmp"; }' EXIT

cat <<EOF > $tmp
--- !COFF
header:
  Machine:         IMAGE_FILE_MACHINE_AMD64
  Characteristics: [ IMAGE_FILE_RELOCS_STRIPPED, IMAGE_FILE_LINE_NUMS_STRIPPED ]
sections:
  - Name:            .data
    Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
    Alignment:       16
    SectionData:     $sectiondata
    SizeOfRawData:   $sectionsize
symbols:
  - Name:            _binary_${3}_start
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            _binary_${3}_size
    Value:           $filesize
    SectionNumber:   -1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            _binary_${3}_end
    Value:           $filesize
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
EOF

yaml2obj -o $2 $tmp
rm -f $tmp

This should generate a COFF with the data from the specified file embedded into it.

Unfortunately, yaml2obj does not ship with Visual Studio’s clang tools and is not present in the LLVM.LLVM package from winget so it may need to be compiled from source in order to use it on Windows.

Build System Integration

It’s generally not a good idea to commit binary files into version control unless the the version control system has good support for them. Git has Git LFS for this and is available on Github but it would be great to automate generating these COFFs during the main program’s build process.

Most build systems should support defining and running custom commands which makes it possible to integrate this during the build process. Here are some basic examples using make, nmake, cmake and meson with the bin2coff.py script. These examples are for a basic project with a main.c source file and a hello.txt file that needs to be embedded. The bin2coff.py script is put in a separate scripts/ directory at scripts/bin2coff.py to keep the root of the project more organized.

Makefile

CC = x86_64-w64-mingw32-gcc
PYTHON = python3
BIN2COFF = scripts/bin2coff.py


.PHONY : all clean
all : main.exe

clean:
	rm main.exe main.o hello.o

main.exe : main.o hello.o
	$(CC) $(LDFLAGS) $(TARGET_ARCH) $^ $(LDLIBS) -o $@

hello.o : hello.txt
	$(PYTHON) $(BIN2COFF) -m amd64 $< $@

NMake

PYTHON = python3
BIN2COFF = .\scripts\bin2coff.py

all : main.exe

clean:
	del /f main.exe main.obj hello.obj

main.exe : main.obj hello.obj
	$(CC) $(CFLAGS) /Fe:$@ $**

hello.obj : hello.txt
	$(PYTHON) $(BIN2COFF) $? $@

CMakeLists.txt

cmake_minimum_required(VERSION 3.18)

project(example LANGUAGES C)

find_package(Python REQUIRED COMPONENTS Interpreter)

add_custom_command(
  OUTPUT hello.o
  COMMAND
    ${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/scripts/bin2coff.py
      ${CMAKE_CURRENT_SOURCE_DIR}/hello.txt
      hello.o
  DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/hello.txt
  COMMENT "Building hello.o from hello.txt with bin2coff.py"
  VERBATIM
)

add_custom_target(hello_gen DEPENDS hello.o)

add_library(hello OBJECT IMPORTED)
set_target_properties(hello PROPERTIES
  IMPORTED_OBJECTS
    "${CMAKE_CURRENT_BINARY_DIR}/hello.o"
)

add_executable(main main.c)
target_link_libraries(main hello)

meson.build

project('example', 'c')

python = find_program('python3', native : true,  required : true)

hello = custom_target(
  'hello',
  output : 'hello.o',
  input : 'hello.txt',
  command : [python, '@CURRENT_SOURCE_DIR@/scripts/bin2coff.py', '@INPUT@', '@OUTPUT@']
)

executable('main', 'main.c', hello)

Wrapping Up

This post provides an alternative method for embedding file data in C/C++ without needing to store it as a large byte array in the source code. GNU ld is the standard linker on most Linux systems so this process works without needing to install any extra tools. Since COFFs are a relatively straightforward file format, it makes writing a custom tool that performs the same functionality a lot simpler to create.


  1. I do not know of any other linkers aside from GNU/MinGW ld and LLVM lld that are capable of outputting an object file with arbitrary binary data embedded inside it. There may be other ones I am unfamiliar with that can. ↩︎

  2. It may be possible to compile LLVM lld from source with COFF target support; however, I have not explored or tried it. ↩︎