Embedding Files in C/C++ Programs
Background
Recently, I came across a post on X by @0xTriboulet asking how to deal with large header files in Visual Studio projects https://x.com/0xTriboulet/status/1878139439714558169.
intelligence intellisense
Based on this post and the rest of the thread, I assume that they were attempting to insert the binary data from a file in their program by converting it into a large C byte array and then pasting that array into a header file.
This is a very common method of embedding binary data from a file inside a C/C++ project. The file data gets converted to hex or decimal and wrapped inside a C array.
#ifndef MYFILE_HEADER_H
#define MYFILE_HEADER_H
const unsigned char MYFILE_DATA[] = { 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, ... };
#endif // MYFILE_HEADER_H
Although this does work, it can lead to some issues with code analysis tools, like auto completion, when they attempt to process this file.
An alternative method of embedding files in a C/C++ program is by skipping the source code part altogether and instead convert the needed file directly into an object file that can be linked in at build time.
TL; DR
GNU/MinGW ld
can generate an object file from arbitrary binary data using the -r
and -b binary
options. This object file will export a few symbols named _binary_FILENAME_start
, _binary_FILENAME_end
, and _binary_FILENAME_size
which can be used to reference the embedded data.
More information can be found in this StackOverflow thread.
A script or other tools can also be used to generate object files with other files embedded into it.
Generating COFFs with ld
This method only1 works with GNU/MinGW ld and LLVM lld. These linkers support different targets which determine what type of object files they are capable of generating. The target formats supported by both GNU/MinGW ld
and LLVM lld
are listed in the --help
output of each program.
# GNU ld
ld --help | grep 'supported targets'
ld: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 pei-i386 pe-x86-64 pei-x86-64 elf64-little elf64-big elf32-little elf32-big pe-bigobj-x86-64 pe-i386 pdb elf64-bpfle elf64-bpfbe srec symbolsrec verilog tekhex binary ihex plugin
# LLVM lld
ld.lld --help | grep 'supported targets'
ld.lld: supported targets: elf
The target format can be specified using the --oformat
flag on the command line. These linkers also support another flag, -r
, which specifies that they should generate a relocatable object file as output instead of a typical executable or shared library. The -b
option is used to specify the type of input file.
Here is how a COFF can be generated using ld
with the contents of an arbitrary file embedded inside it.
matt@laptop :: ~ >> cat hello.txt
Hello World
matt@laptop :: ~ >> ld -r --oformat pe-x86-64 -b binary -o hello.o hello.txt
matt@laptop :: ~ >> file hello.o
hello.o: Intel amd64 COFF object file, no relocation info, no line number info, not stripped, 1 section, symbol offset=0x4c, 3 symbols, 1st section name ".data"
matt@laptop :: ~ >>
MinGW ld
can also do the same but without needing the --oformat
since it produces COFFs by default.
Along with the embedded file data, the COFF will also contain a set of symbols that can be used for referencing that data. These symbols can be viewed using any generic tool that is capable of displaying the symbol table of a COFF. One such tool is rabin2.
matt@laptop :: ~ >> rabin2 -s hello.o
[Symbols]
nth paddr vaddr bind type size lib name demangled
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0 0x0000003c 0x00000000 GLOBAL UNK 4 _binary_hello_txt_start
0 0x00000000 0x00000000 GLOBAL UNK 4 _binary_hello_txt_size
0 0x00000048 0x0000000c GLOBAL UNK 4 _binary_hello_txt_end
The symbol names are derived from the name of the input file passed into ld
.
Using ld Generated COFFs
Now in the C/C++ program, the file data can be referenced through those exported symbols.
#include <stdio.h>
extern unsigned char _binary_hello_txt_start[];
extern unsigned char _binary_hello_txt_end;
extern unsigned char _binary_hello_txt_size;
int main(void) {
printf("Array: %p\n", _binary_hello_txt_start);
printf("Array end: %p\n", &_binary_hello_txt_end);
printf("Array size: %llu\n", &_binary_hello_txt_size);
return 0;
}
There are three important things worth mentioning here:
- The file data is not guaranteed to be NULL terminated. If the data is being used as a string, a NULL terminator should be manually inserted to ensure that it is present.
ld
will pad the section with NULL bytes to match the specified alignment (16 bytes) but will not add this padding if it is not needed. The NULL terminator can be added either before the input file is passed throughld
or during post-processing. - The
_binary_hello_txt_end
symbol is the first byte past the end of the file data. A&
is needed to get the address of it. - The
_binary_hello_txt_size
symbol is a little weird. The actual size of the file data is the address of the symbol and not the symbol itself. This is why the&
is needed to get the array size. It may be easier to create a macro that derives the size of the file data based on the start and end addresses of the array.
#include <stdio.h>
extern unsigned char _binary_hello_txt_start[];
extern unsigned char _binary_hello_txt_end;
#define hello_txt_size (&_binary_hello_txt_end - _binary_hello_txt_start)
int main(void) {
printf("Array size: %llu\n", hello_txt_size);
return 0;
}
The object file with the embedded file data can be linked in as is without needing to add any special link flags.
x86_64-w64-mingw32-gcc -o main.exe main.c hello.o
# Or with clang
clang --target=x86_64-windows-gnu -o main.exe main.c hello.o
This method of embedding arbitrary file data into a C/C++ program is a lot nicer since it does not require any extra developer tooling and eliminates the need to generate a source file for the embedded data.
Object Post-Processing
Some post-processing may be needed in order to get the object file in a more desirable state. This can be done with objcopy or various other custom tools.
Here are some example scenarios where objcopy can be used to make these modifications.
Renaming the Exported Symbols
The auto-generated symbol names can be renamed using the --redefine-sym
flag.
objcopy \
--redefine-sym _binary_hello_txt_start=hello_data_start \
--redefine-sym _binary_hello_txt_end=hello_data_end \
--redefine-sym _binary_hello_txt_size=hello_data_size \
hello.o
Removing the _size Symbol
If the *_size
symbol or any other symbol is not needed, it can be removed from the COFF.
objcopy -N _binary_hello_txt_size hello.o
Adding a Trailing NULL Byte
A trailing NULL byte can be added at the end of the file data if it is being used as a string. This is done by creating a temporary file with the extracted section data, adding the trailing NULL byte to it, and then reinserting it back into the COFF.
export SECTION_TMPFILE=$(mktemp -t -p /tmp objcopy-section.XXXXX)
objcopy --dump-section .data=$SECTION_TMPFILE hello.o
printf '\0' >> $SECTION_TMPFILE
objcopy --update-section .data=$SECTION_TMPFILE hello.o
rm $SECTION_TMPFILE
unset SECTION_TMPFILE
Changing the Section of the File Data
ld
by default will insert the file into the .data
section. This can be changed to a different section if desired.
objcopy --rename-section .data=.rdata hello.o
The section flags will automatically be adjusted if the new section name is a standard section name. If the section flags need to be manually adjusted for custom sections, they can be specified using the --set-section-flags
option or in the --rename-section
option.
objcopy --rename-section .data=custom hello.o
objcopy --set-section-flags custom=alloc,load,readonly,data,contents hello.o
# Changing the flags during renaming
objcopy --rename-section .data=custom,alloc,load,readonly,data,contents hello.o
The flag values are listed under the --set-section-flags
option in the objcopy man page.
Generating COFFs from Scratch
ld
and objcopy
provide a ton of flexibility for converting arbitrary binary data files into linkable object files. One of the disadvantages is that this file embedding workflow is not ideal for Windows environments. MSVC’s link.exe
is unable to generate an object file from an arbitrary binary input file and LLVM lld
only supports generating ELFs2. The Windows version of LLVM lld
bundled with Visual Studio also does not support the -r
or -b
flags.
There are many existing tools publicly available that work on Windows and can generate COFFs from an arbitrary binary file. These can be found by searching for “bin2obj” or “bin2coff” programs online.
Writing a custom tool for this can provide a lot more flexibility than what existing implementations may offer.
It may seem a little complicated at first; however, COFFs are a pretty straightforward file format and the lack of relocations makes things slightly easier.
The “Writing Beacon Object Files Without DFR” blog post starting from the “So How Does This Work?” section contains an in-depth walk through on the COFF file structure which provides some good background on this.
Looking at the COFF Generated by ld
Analyzing the COFFs generated by ld
can help with understanding what is all involved to accomplish this.
This will create a basic test COFF for exploring.
echo "Hello World" > hello.txt
ld -r --oformat pe-x86-64 -b binary -o hello.o hello.txt
Here is the hex-dump of the generated COFF.
00000000: 6486 0100 0000 0000 4c00 0000 0300 0000 d.......L.......
00000010: 0000 0500 2e64 6174 6100 0000 0000 0000 .....data.......
00000020: 0000 0000 1000 0000 3c00 0000 0000 0000 ........<.......
00000030: 0000 0000 0000 0000 4000 50c0 4865 6c6c [email protected]
00000040: 6f20 576f 726c 640a 0000 0000 0000 0000 o World.........
00000050: 0400 0000 0000 0000 0100 0000 0200 0000 ................
00000060: 0000 1c00 0000 0c00 0000 ffff 0000 0200 ................
00000070: 0000 0000 3300 0000 0c00 0000 0100 0000 ....3...........
00000080: 0200 4900 0000 5f62 696e 6172 795f 6865 ..I..._binary_he
00000090: 6c6c 6f5f 7478 745f 7374 6172 7400 5f62 llo_txt_start._b
000000a0: 696e 6172 795f 6865 6c6c 6f5f 7478 745f inary_hello_txt_
000000b0: 7369 7a65 005f 6269 6e61 7279 5f68 656c size._binary_hel
000000c0: 6c6f 5f74 7874 5f65 6e64 00 lo_txt_end.
It is pretty small with the majority of the file consisting of the COFF’s metadata.
Ghidra, or any other disassembler, can also help with further analyzing it.
This listing display contains the entire COFF. There is a COFF File Header, a single Section Header for the .data
section and the file contents inside the .data
section.
The symbol table (Window -> Symbol Table
) lists all of the defined symbols.
These are the symbols that are automatically generated and exported for use inside the main program. The “Source” column in Ghidra shows them as being “Imported” but that just means they are defined with IMAGE_SYM_CLASS_EXTERNAL
storage class.
The relocations (Window -> Relocation Table
) window will show 0 relocations because there are no references in this COFF that reference external data or data in other sections.
This COFF information can be laid out linearly as it is in the file but with the values for each structure filled in.
/** COFF File header */
struct COFFFileHeader {
Machine = IMAGE_FILE_MACHINE_AMD64, /* 0x8664 */
NumberOfSections = 1,
TimeDateStamp = 0,
PointerToSymbolTable = 0x4c,
NumberOfSymbols = 3,
SizeOfOptionalHeader = 0,
Characteristics = IMAGE_FILE_RELOCS_STRIPPED | IMAGE_FILE_LINE_NUMS_STRIPPED, /* 0x5 */
};
/** Section Table (Section Headers) */
/* Only 1 section header in the section table */
struct COFFSectionHeader {
Name = ".data",
VirtualSize = 0,
VirtualAddress = 0,
SizeOfRawData = 0x10,
PointerToRawData = 0x3c,
PointerToRelocations = 0,
PointerToLineNumbers = 0,
NumberOfRelocations = 0,
NumberOfLineNumbers = 0,
Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_ALIGN_16BYTES | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE, /* 0xc0500040 */
};
/** The embedded file's contents are located here after the section header and padded with NULL bytes */
"Hello World"
/** COFF Symbol Table (3 symbols) */
struct COFFSymbol {
Name = "_binary_hello_txt_start",
Value = 0,
SectionNumber = 1,
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL, /* 0x0 */
StorageClass = IMAGE_SYM_CLASS_EXTERNAL, /* 0x2 */
NumberOfAuxSymbols = 0,
};
struct COFFSymbol {
Name = "_binary_hello_txt_size",
Value = 0xc,
SectionNumber = IMAGE_SYM_ABSOLUTE, /* -1 */
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL, /* 0x0 */
StorageClass = IMAGE_SYM_CLASS_EXTERNAL, /* 0x2 */
NumberOfAuxSymbols = 0,
};
struct COFFSymbol {
Name = "_binary_hello_txt_end",
Value = 0xc,
SectionNumber = 1,
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL, /* 0x0 */
StorageClass = IMAGE_SYM_CLASS_EXTERNAL, /* 0x2 */
NumberOfAuxSymbols = 0,
};
/* String table */
One thing missing from this is the COFF’s string table. The string table contains the string values from the Name
fields in the COFFSymbol
structures. They are stored there because those strings are longer than 8 bytes. More information on how the string table is utilized can also be found in the “So How Does This Work?” section from the “Writing Beacon Object Files Without DFR” post.
In the _binary_hello_txt_size
symbol, the value is set to the size of the file data and the symbol’s section value is set to IMAGE_SYM_ABSOLUTE
. This is where the weirdness with that symbol mentioned above comes from and why you need to reference the symbol’s address to get the actual value. There is a way to fix that mentioned later in the “Fixing the _size Symbol” section.
The parts that will vary in this metadata are the PointerToSymbolTable
field in the COFFFileHeader
, the SizeOfRawData
field in the COFFSectionHeader
, and the Value
fields in the _binary_hello_txt_size
and _binary_hello_txt_end
symbols. They change based on the size of the file being embedded.
There are some patterns with this data. The PointerToSymbolTable
field in the COFFFileHeader
structure is actually the PointerToRawData
field from the COFFSectionHeader
structure plus the size of the embedded file itself. The SizeOfRawData
field in the COFFSectionHeader
is the file size rounded up to the next multiple of the alignment specified in the Characteristics
field (IMAGE_SCN_ALIGN_16BYTES
16 bytes). The section data is padded with NULL bytes at the end to reach this size. The Value
fields in the _binary_hello_txt_size
and _binary_hello_txt_end
COFFSymbol
s are the same and are just the size of the file.
Ultimately, the COFF consists of the structures above laid out in the order below with the file data embedded somewhere in the center.
- COFF File Header
- Section header for the
.data
section - Embedded file’s data padded with NULL bytes to reach the specified alignment
- The
COFFSymbol
structure array - The string table
Fixing the _size Symbol
The way the *_size
symbol is defined is a little bit inconvenient to work with when referencing it in a program. A simple way of fixing this is to append the file’s size at the end of the section and set the *_size
symbol to reference that value. Another way is to create a new .rdata
section with the file size and have the symbol reference that value.
Here is what the former looks like.
Current layout of the file data and symbol table.
/* The section data as a hexdump */
00000000: 4865 6c6c 6f20 576f 726c 640a 0000 0000 Hello World.....
/* The symbol table */
struct COFFSymbol {
Name = "_binary_hello_txt_start",
Value = 0,
SectionNumber = 1,
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
NumberOfAuxSymbols = 0,
};
struct COFFSymbol {
Name = "_binary_hello_txt_size",
Value = 0xc,
SectionNumber = IMAGE_SYM_ABSOLUTE,
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
NumberOfAuxSymbols = 0,
};
struct COFFSymbol {
Name = "_binary_hello_txt_end",
Value = 0xc,
SectionNumber = 1,
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
NumberOfAuxSymbols = 0,
};
New layout with the file size added in.
/* The section data as a hexdump */
00000000: 4865 6c6c 6f20 576f 726c 640a 0000 0000 Hello World.....
00000010: 0c00 0000 0000 0000 0000 0000 0000 0000 ................ // Size of the file (0xc) in little endian and added at the end with some NULL byte padding.
/* The symbol table */
struct COFFSymbol {
Name = "_binary_hello_txt_start",
Value = 0,
SectionNumber = 1,
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
NumberOfAuxSymbols = 0,
};
struct COFFSymbol {
Name = "_binary_hello_txt_size",
Value = 0x10, // Value adjusted to point to the inserted size value in the section
SectionNumber = 1, // SectionNumber adjusted to reference the section with the size value
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
NumberOfAuxSymbols = 0,
};
struct COFFSymbol {
Name = "_binary_hello_txt_end",
Value = 0xc,
SectionNumber = 1,
Type = IMAGE_SYM_DTYPE_NULL | IMAGE_SYM_TYPE_NULL,
StorageClass = IMAGE_SYM_CLASS_EXTERNAL,
NumberOfAuxSymbols = 0,
};
Now, the size of the file data can be referenced without needing to take the address of it.
#include <stdint.h>
#include <stdio.h>
extern char _binary_hello_txt_start[];
extern char _binary_hello_txt_end;
extern size_t _binary_hello_txt_size;
int main(void) {
printf("Array size: %llu\n", _binary_hello_txt_size);
return 0;
}
bin2coff.py
I wrote a small, self-contained python script for generating COFFs that should work on various different platforms (Linux, Windows, Mac, etc.).
https://gist.github.com/MEhrn00/9615b92d9bfd3c85d6cba69edb31387d
Generating COFFs from… yaml?
Some people might say that this method is extremely cursed and should not be a thing that actually exists. Others may marvel at its pristine beauty and grasp the true extent of its full potential.
In the LLVM toolset, there exists a set of two peculiar tools named yaml2obj and obj2yaml.
As the names may imply, these tools are capable of creating object files from a yaml file that describes its contents.
Here is what that looks like. The yaml description of the hello.o
file from above can be printed out using obj2yaml
.
matt@laptop :: ~ >> obj2yaml hello.o
--- !COFF
header:
Machine: IMAGE_FILE_MACHINE_AMD64
Characteristics: [ IMAGE_FILE_RELOCS_STRIPPED, IMAGE_FILE_LINE_NUMS_STRIPPED ]
sections:
- Name: .data
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 16
SectionData: 48656C6C6F20576F726C640A00000000
SizeOfRawData: 16
symbols:
- Name: _binary_hello_txt_start
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
- Name: _binary_hello_txt_size
Value: 12
SectionNumber: -1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
- Name: _binary_hello_txt_end
Value: 12
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
...
This yaml data contains all of the information pertaining to this COFF.
A new COFF can be generated by modifying the values in this yaml file or by creating a new yaml file from scratch and then running it through yaml2obj
.
For the purpose of embedding files into C/C++ programs, this is achievable with yaml2obj
.
The yaml example file above can be used as a template for embedding another file into it. The section data string will need to be replaced with the hex string of the target file for embedding and the other metadata needs to be adjusted to account for the size of the file data. Then, yaml2obj
can take the modified yaml file and generate a fresh object file for linking that contains the embedded file data.
This process can be scripted out using something like bash or python.
#!/bin/bash
if [ "$#" -ne 3 ]; then
echo "usage: $0 [INPUT] [OUTPUT] [SYMBOL]"
exit 1
fi
if [ ! -f "$1" ]; then
echo "$1 is not a file."
exit 1
fi
if [ -z "$3" ]; then
echo "Symbol is empty."
exit 1
fi
filesize=$(ls -l $1 | cut -d' ' -f5)
filehex=$(xxd -p -u -c0 $1)
sectiondata=$filehex
sectionsize=$filesize
alignment=16
remainder=$(($sectionsize % $alignment))
if [ $remainder -ne 0 ]; then
padding=$(($alignment - $remainder))
for _ in $(seq $padding); do
sectiondata+="00"
done
sectionsize=$(($sectionsize + $padding))
fi
tmp=$(mktemp -t -p /tmp yaml2obj.XXXXX)
trap '{ rm -f -- "$tmp"; }' EXIT
cat <<EOF > $tmp
--- !COFF
header:
Machine: IMAGE_FILE_MACHINE_AMD64
Characteristics: [ IMAGE_FILE_RELOCS_STRIPPED, IMAGE_FILE_LINE_NUMS_STRIPPED ]
sections:
- Name: .data
Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_WRITE ]
Alignment: 16
SectionData: $sectiondata
SizeOfRawData: $sectionsize
symbols:
- Name: _binary_${3}_start
Value: 0
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
- Name: _binary_${3}_size
Value: $filesize
SectionNumber: -1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
- Name: _binary_${3}_end
Value: $filesize
SectionNumber: 1
SimpleType: IMAGE_SYM_TYPE_NULL
ComplexType: IMAGE_SYM_DTYPE_NULL
StorageClass: IMAGE_SYM_CLASS_EXTERNAL
EOF
yaml2obj -o $2 $tmp
rm -f $tmp
This should generate a COFF with the data from the specified file embedded into it.
Unfortunately, yaml2obj
does not ship with Visual Studio’s clang tools and is not present in the LLVM.LLVM
package from winget so it may need to be compiled from source in order to use it on Windows.
Build System Integration
It’s generally not a good idea to commit binary files into version control unless the the version control system has good support for them. Git has Git LFS for this and is available on Github but it would be great to automate generating these COFFs during the main program’s build process.
Most build systems should support defining and running custom commands which makes it possible to integrate this during the build process. Here are some basic examples using make, nmake, cmake and meson with the bin2coff.py script. These examples are for a basic project with a main.c
source file and a hello.txt
file that needs to be embedded. The bin2coff.py
script is put in a separate scripts/
directory at scripts/bin2coff.py
to keep the root of the project more organized.
Makefile
CC = x86_64-w64-mingw32-gcc
PYTHON = python3
BIN2COFF = scripts/bin2coff.py
.PHONY : all clean
all : main.exe
clean:
rm main.exe main.o hello.o
main.exe : main.o hello.o
$(CC) $(LDFLAGS) $(TARGET_ARCH) $^ $(LDLIBS) -o $@
hello.o : hello.txt
$(PYTHON) $(BIN2COFF) -m amd64 $< $@
NMake
PYTHON = python3
BIN2COFF = .\scripts\bin2coff.py
all : main.exe
clean:
del /f main.exe main.obj hello.obj
main.exe : main.obj hello.obj
$(CC) $(CFLAGS) /Fe:$@ $**
hello.obj : hello.txt
$(PYTHON) $(BIN2COFF) $? $@
CMakeLists.txt
cmake_minimum_required(VERSION 3.18)
project(example LANGUAGES C)
find_package(Python REQUIRED COMPONENTS Interpreter)
add_custom_command(
OUTPUT hello.o
COMMAND
${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/scripts/bin2coff.py
${CMAKE_CURRENT_SOURCE_DIR}/hello.txt
hello.o
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/hello.txt
COMMENT "Building hello.o from hello.txt with bin2coff.py"
VERBATIM
)
add_custom_target(hello_gen DEPENDS hello.o)
add_library(hello OBJECT IMPORTED)
set_target_properties(hello PROPERTIES
IMPORTED_OBJECTS
"${CMAKE_CURRENT_BINARY_DIR}/hello.o"
)
add_executable(main main.c)
target_link_libraries(main hello)
meson.build
project('example', 'c')
python = find_program('python3', native : true, required : true)
hello = custom_target(
'hello',
output : 'hello.o',
input : 'hello.txt',
command : [python, '@CURRENT_SOURCE_DIR@/scripts/bin2coff.py', '@INPUT@', '@OUTPUT@']
)
executable('main', 'main.c', hello)
Wrapping Up
This post provides an alternative method for embedding file data in C/C++ without needing to store it as a large byte array in the source code. GNU ld
is the standard linker on most Linux systems so this process works without needing to install any extra tools. Since COFFs are a relatively straightforward file format, it makes writing a custom tool that performs the same functionality a lot simpler to create.
I do not know of any other linkers aside from GNU/MinGW
ld
and LLVMlld
that are capable of outputting an object file with arbitrary binary data embedded inside it. There may be other ones I am unfamiliar with that can. ↩︎It may be possible to compile LLVM
lld
from source with COFF target support; however, I have not explored or tried it. ↩︎