This article covers the issues I found myself having while using getopt.h
for
a programming language bytecode interpreter pipeline I am currently working on
(purple-garden
1),
implementation details for the replacement I wrote, examples of how to use the
new solution and how I replaced the previous implementation with the new one
using 6cl
2.
Tip or: TL;DR
getopt.h
kinda sucks to use, because:
- You have to write the help page and the usage page yourself
- You have to keep track of all flags and their handling at least twice (once in your flag definitions, once for short option names and once in their handling)
- Flags aren’t typed, so all option arguments are just C strings accessible via
global state
optarg
- Default values aren’t really a thing
- Option prefix is hardcoded to
-
for short options and--
for long options (thanks for that UNIX and POSIX)
I solved these issues by:
- Generating help and usage pages from the list of options
- Single flag list, no short and long definition in multiple places
- Typing flag options with seven types, range validation (
float.h
,limits.h
) and default values:- string
- boolean
- character
- integer and long
- float and double
- Defaults and types are just fields of the flag struct in the flag definition
- Macro for setting the prefix and single prefix for all options
I wrote 6cl
to fill the gaps I encountered.
6cl is an opiniated command line option parsing library for 6wm3 and
purple garden
Warning
As always, I am not perfect and neither are my code and blog articles, thus if you find any issues or have any questions, feel free to email me: contact@xnacly.me orcontact (at) xnacly.me
.Current state
The purple-garden (pg), interpreter accepts a number of options:
1$ ./purple_garden -h
2usage: purple_garden [-v | --version] [-h | --help]
3 [-d | --disassemble] [-b<size> | --block-allocator=<size>]
4 [-a | --aot-functions] [-m | --memory-usage]
5 [-V | --verbose] [-r<input> | --run=<input>] <file.garden>
6
7Options:
8 -v, --version
9 display version information
10
11 -h, --help
12 extended usage information
13
14 -d, --disassemble
15 readable bytecode representation with labels,
16 globals and comments
17
18 -b=<size>, --block-allocator=<size>
19 use block allocator instead of garbage collection
20
21 -a, --aot-functions
22 compile all functions to machine code
23
24 -m, --memory-usage
25 display the memory usage of parsing,
26 compilation and the virtual machine
27
28 -V, --verbose
29 verbose logging
30
31 -r=<input>, --run=<input>
32 executes the argument as if an input file was given
Command line argument parsing using getopt
I handle these by first defining a struct to hold all options, for later reference:
1typedef struct {
2 // options - int because getopt has no bool support
3
4 // use block allocator instead of garbage collection
5 size_t block_allocator;
6 // compile all functions to machine code
7 int aot_functions;
8 // readable bytecode representation with labels, globals and comments
9 int disassemble;
10 // display the memory usage of parsing, compilation and the virtual machine
11 int memory_usage;
12
13 // executes the argument as if an input file was given
14 char *run;
15
16 // verbose logging
17 int verbose;
18
19 // options in which we exit after toggle
20 int version;
21 int help;
22
23 // entry point - last argument thats not an option
24 char *filename;
25} Args;
After that I define the list of options so I can keep track of them once getopt does its stuff:
1typedef struct {
2 const char *name_long;
3 const char name_short;
4 const char *description;
5 const char *arg_name;
6} cli_option;
7
8// WARN: DO NOT REORDER THIS - will result in option handling issues
9static const cli_option options[] = {
10 {"version", 'v', "display version information", ""},
11 {"help", 'h', "extended usage information", ""},
12 {"disassemble", 'd', "readable bytecode representation with labels, globals and comments", ""},
13 {"block-allocator", 'b', "use block allocator instead of garbage collection", "<size>"},
14 {"aot-functions", 'a', "compile all functions to machine code", ""},
15 {"memory-usage", 'm', "display the memory usage of parsing, compilation and the virtual machine", ""},
16 {"verbose", 'V', "verbose logging", ""},
17 {"run", 'r', "executes the argument as if an input file was given", "<input>"},
18};
The heavy lifting is of course done in the Args_parse
function:
1Args Args_parse(int argc, char **argv) {
2 // [...] 1.-6.
3}
Convert the array of
cli_option
’s togetopt
’soption
C1 Args a = (Args){0}; 2 // MUST be in sync with options, otherwise this will not work as intended 3 struct option long_options[] = { 4 {options[0].name_long, no_argument, &a.version, 1}, 5 {options[1].name_long, no_argument, &a.help, 1}, 6 {options[2].name_long, no_argument, &a.disassemble, 1}, 7 {options[3].name_long, required_argument, 0, 'b'}, 8 {options[4].name_long, no_argument, &a.aot_functions, 1}, 9 {options[5].name_long, no_argument, &a.memory_usage, 1}, 10 {options[6].name_long, no_argument, &a.verbose, 1}, 11 {options[7].name_long, required_argument, 0, 'r'}, 12 {0, 0, 0, 0}, 13 };
Pass the array to
getopt_long
with the matching short flag definitionvhdb:amVr:
(third location to define flags)C1 int opt; 2 while ((opt = getopt_long(argc, argv, "vhdb:amVr:", long_options, NULL)) != 3 -1) { 4 // [...] 5 }
Handle short options separately from long option “automatic” handling (touching every flag twice)
C1 switch (opt) { 2 case 'v': 3 a.version = 1; 4 break; 5 case 'V': 6 a.verbose = 1; 7 break; 8 case 'h': 9 a.help = 1; 10 break; 11 case 'd': 12 a.disassemble = 1; 13 break; 14 case 'r': 15 a.run = optarg; 16 break; 17 case 'b': 18 char *endptr; 19 size_t block_size = strtol(optarg, &endptr, 10); 20 ASSERT(endptr != optarg, "args: Failed to parse number from: %s", optarg); 21 a.block_allocator = block_size; 22 break; 23 case 'a': 24 a.aot_functions = 1; 25 break; 26 case 'm': 27 a.memory_usage = 1; 28 break; 29 case 0: 30 break; 31 default: 32 usage(); 33 exit(EXIT_FAILURE); 34 }
Store all non flags as rest, representing the entry file (
filename
)C1 if (optind < argc) { 2 a.filename = argv[optind]; 3 }
Act on commands, like
--version
,--help
and their short variantsC1 if (a.version) { 2 // [...] 3 } else if (a.help) { 4 usage(); 5 // [...] 6 }
Error if no input to the interpreter is detected
C1 if (a.filename == NULL && a.run == NULL) { 2 usage(); 3 fprintf(stderr, "error: Missing a file? try `-h/--help`\n"); 4 exit(EXIT_FAILURE); 5 };
6cl Design, API and Examplary Usage
The API design is inspired by Go’s flag package, Google’s gflag, my general experience with programming languages (Go, Rust, C, etc.) and my attempt to create an ergonomic interface around the constraints of the C programming language.
By ergnomic I mean:
- single location for defining flags (no setting and handling them multiple times)
- option format
- boolean options have no argument
- merged short and long options (no
-s
,--save
, but-s
and-save
) - no combined options (no
-xvf
or-lah
, but-x -v -f
and-l -a -h
) - no name and option merges, such as
+DCONSTANT=12
or+n128
, but rather+D CONSTANT=12
and+n 128
- type safe and early errors if types can’t be to parsed or over-/underflows occur
- type safe default values if flags aren’t specified
Defining Flags
A flag consists of a long name, a short name, a type, a description and a default value that matches its type.
The type is defined as an enum:
1typedef enum {
2 SIX_STR,
3 SIX_BOOL,
4 SIX_CHAR,
5 SIX_INT,
6 SIX_LONG,
7 SIX_FLOAT,
8 SIX_DOUBLE,
9} SixFlagType;
The flag itself holds all aforementioned fields:
1typedef struct {
2 // name of the flag, for instance +<name>; +help
3 const char *name;
4 // short name, like +<short_name>; +h
5 char short_name;
6 // Defines the datatype
7 SixFlagType type;
8 // used in the help page
9 const char *description;
10
11 // typed result values, will be filled with the value if any is found found
12 // for the option, or with the default value thats already set.
13 union {
14 // string value
15 char *s;
16 // boolean value
17 bool b;
18 // char value
19 char c;
20 // int value
21 int i;
22 // long value
23 long l;
24 // float value
25 float f;
26 // double value
27 double d;
28 };
29} SixFlag;
So a flag +pi <double> / +p <double>
would be defined as:
1SixFlag pi = {
2 .name = "pi",
3 .short_name = 'p',
4 .d = 3.1415,
5 .type = SIX_DOUBLE,
6 .description = "define pi",
7};
This has to be passed to the Six
struct, holding the available flags:
1typedef struct Six {
2 SixFlag *flags;
3 size_t flag_count;
4 // usage will be postfixed with this
5 const char *name_for_rest_arguments;
6 // rest holds all arguments not matching any defined options
7 char *rest[SIX_MAX_REST];
8 size_t rest_count;
9} Six;
The fields flags
and flag_count
must be set before calling SixParse
:
1typedef enum { UNKNOWN = -1, PI} Option;
2
3SixFlag options[] = {
4 [PI] = {
5 .name = "pi",
6 .short_name = 'p',
7 .d = 3.1415,
8 .type = SIX_DOUBLE,
9 .description = "define pi",
10 },
11};
12Six s = {0};
13s.flags = options;
14s.flag_count = sizeof(options) / sizeof(SixFlag);
15SixParse(&s, argc, argv);
Accessing Flags
The flags can be accessed by indexing into the options array:
1double pi = s.flags[PI].d;
2printf("%f\n", pi);
Fusing into an Example
I use an example to test the pipeline, so I’ll just dump this one here:
1/*
2 * A dice roller that simulates rolling N dice with M sides, optionally
3 * labeled, and with verbose output to print each roll result.
4 *
5 * $ gcc ./dice.c ../6cl.c -o dice
6 * $ ./dice +n 4 +m 6
7 * => 14
8 * $ ./dice +rolls 2 +sides 20 +label "STR"
9 * STR: => 29
10 * $ ./dice +n 3 +m 10 +v
11 * Rolled: 3 + 7 + 5 =15
12 */
13#include "../6cl.h"
14
15#include <assert.h>
16#include <stdio.h>
17#include <stdlib.h>
18#include <time.h>
19
20#define ERR(FMT, ...) fprintf(stderr, "dice: " FMT "\n", ##__VA_ARGS__);
21
22void dice(int *throws, unsigned int n, unsigned int m) {
23 for (size_t i = 0; i < n; i++) {
24 throws[i] = (rand() % m) + 1;
25 }
26}
27
28typedef enum { UNKNOWN = -1, ROLLS, SIDES, LABEL, VERBOSE } Option;
29
30int main(int argc, char **argv) {
31 srand((unsigned int)time(NULL));
32
33 SixFlag options[] = {
34 [ROLLS] = {.name = "rolls",
35 .short_name = 'n',
36 .i = 2,
37 .type = SIX_INT,
38 .description = "times to roll"},
39 [SIDES] = {.name = "sides",
40 .short_name = 'm',
41 .i = 6,
42 .type = SIX_INT,
43 .description = "sides the dice has"},
44 [LABEL] =
45 {
46 .name = "label",
47 .short_name = 'l',
48 .s = "=> ",
49 .type = SIX_STR,
50 .description = "prefix for the dice roll result",
51 },
52 [VERBOSE] =
53 {
54 .name = "verbose",
55 .short_name = 'v',
56 .type = SIX_BOOL,
57 .description = "print all rolls, not only the result",
58 },
59 };
60 Six s = {0};
61 s.flags = options;
62 s.flag_count = sizeof(options) / sizeof(SixFlag);
63
64 SixParse(&s, argc, argv);
65 if (s.flags[VERBOSE].b) {
66 printf("Config{rolls=%d, sides=%d, label=`%s`}\n", s.flags[ROLLS].i,
67 s.flags[SIDES].i, s.flags[LABEL].s);
68 }
69
70 if (options[ROLLS].i < 1) {
71 ERR("Rolls can't be < 1");
72 return EXIT_FAILURE;
73 }
74
75 int throws[options[ROLLS].i];
76 dice(throws, options[ROLLS].i, options[SIDES].i);
77
78 int cum = 0;
79 for (int i = 0; i < options[ROLLS].i; i++) {
80 int roll = throws[i];
81 cum += roll;
82 if (options[VERBOSE].b) {
83 printf("[roll=%02d]::[%02d/%02d]\n", i + 1, roll, options[SIDES].i);
84 }
85 }
86
87 printf("%s%d\n", options[LABEL].s, cum);
88
89 return EXIT_SUCCESS;
90}
Generating Documentation
From here on out, I’ll show how I implemented the command line parser and the API surface.
If the user passes a malformed input to a well written application, it should provide a good error message, a usage overview and a note on how to get in depth help. Since each option has a short name, a long name, a type, a default value and a description - I want to display all of the aforementioned in the help and a subset in the usage page.
Usage
The usage page is displayed if the application is invoked with either +h
or
+help
or the 6cl parser hits an error (for the former two just as a prefix to
the help page):
1$ ./examples/dice.out +k
2Unknown short option 'k'
3usage ./examples/dice.out: [ +n / +rolls <int=2>] [ +m / +sides <int=6>]
4 [ +l / +label <string=`=> `>] [ +v / +verbose]
5 [ +h / +help]
I created a helper for printing a flag and all its options - print_flag
:
1void print_flag(SixFlag *f, bool long_option) {
2 char *pre_and_postfix = "[]";
3 if (long_option) {
4 putc('\t', stdout);
5 pre_and_postfix = " ";
6 }
7
8 printf("%c %c%c / %c%s", pre_and_postfix[0], SIX_OPTION_PREFIX, f->short_name,
9 SIX_OPTION_PREFIX, f->name);
10 if (f->type != SIX_BOOL) {
11 printf(" <%s=", SIX_FLAG_TYPE_TO_MAP[f->type]);
12 switch (f->type) {
13 case SIX_STR:
14 printf("`%s`", f->s);
15 break;
16 case SIX_CHAR:
17 putc(f->c, stdout);
18 break;
19 case SIX_INT:
20 printf("%d", f->i);
21 break;
22 case SIX_LONG:
23 printf("%ld", f->l);
24 break;
25 case SIX_FLOAT:
26 printf("%g", f->f);
27 break;
28 case SIX_DOUBLE:
29 printf("%g", f->d);
30 break;
31 default:
32 }
33 putc('>', stdout);
34 }
35 putc(pre_and_postfix[1], stdout);
36 putc(' ', stdout);
37
38 if (long_option) {
39 if (f->description) {
40 printf("\n\t\t%s\n", f->description);
41 }
42 putc('\n', stdout);
43 }
44}
After every two options there is a newline inserted to make the output more readable.
1static SixFlag HELP_FLAG = {
2 .name = "help",
3 .short_name = 'h',
4 .description = "help page and usage",
5 .type = SIX_BOOL,
6};
7
8// part of -h, --help, +h, +help and any unknown option
9static void usage(const char *pname, const Six *h) {
10 // should i put this to stdout or stderr
11 printf("usage %s: ", pname);
12 size_t len = strlen(pname) + 7;
13 for (size_t i = 0; i < h->flag_count; i++) {
14 print_flag(&h->flags[i], false);
15 if ((i + 1) % 2 == 0 && i + 1 < h->flag_count) {
16 printf("\n%*.s ", (int)len, "");
17 }
18 }
19
20 printf("\n%*.s ", (int)len, "");
21 print_flag(&HELP_FLAG, false);
22
23 if (h->name_for_rest_arguments) {
24 puts(h->name_for_rest_arguments);
25 } else {
26 puts("");
27 }
28}
Examples
To generate two examples (one with long names and one with short names), the default values are used:
1Examples:
2 ./examples/dice.out +n 2 +m 6 \
3 +l "=> " +v
4
5 ./examples/dice.out +rolls 2 +sides 6 \
6 +label "=> " +verbose
As with the usage, after every two options there is a newline inserted.
1static void help(const char *pname, const Six *h) {
2 size_t len = strlen(pname);
3 // [...]
4
5 printf("Examples: ");
6 for (size_t i = 0; i < 2; i++) {
7 printf("\n\t%s ", pname);
8 for (size_t j = 0; j < h->flag_count; j++) {
9 SixFlag *s = &h->flags[j];
10 if (i) {
11 printf("%c%s", SIX_OPTION_PREFIX, s->name);
12 } else {
13 printf("%c%c", SIX_OPTION_PREFIX, s->short_name);
14 }
15 switch (s->type) {
16 case SIX_STR:
17 printf(" \"%s\"", s->s);
18 break;
19 case SIX_CHAR:
20 printf(" %c", s->c);
21 break;
22 case SIX_INT:
23 printf(" %d", s->i);
24 break;
25 case SIX_LONG:
26 printf(" %zu", s->l);
27 break;
28 case SIX_FLOAT:
29 case SIX_DOUBLE:
30 printf(" %g", s->f);
31 break;
32 case SIX_BOOL:
33 default:
34 break;
35 }
36 putc(' ', stdout);
37 if ((j + 1) % 2 == 0 && j + 1 < h->flag_count) {
38 printf("\\\n\t %*.s", (int)len, "");
39 }
40 }
41 puts("");
42 }
43}
Help Page
The help page merges the usage, the extended option display (with description) and the example sections:
1$ ./examples/dice.out +help
2usage ./examples/dice.out: [ +n / +rolls <int=2>] [ +m / +sides <int=6>]
3 [ +l / +label <string=`=> `>] [ +v / +verbose]
4 [ +h / +help]
5
6Option:
7 +n / +rolls <int=2>
8 times to roll
9
10 +m / +sides <int=6>
11 sides the dice has
12
13 +l / +label <string=`=> `>
14 prefix for the dice roll result
15
16 +v / +verbose
17 print all rolls, not only the result
18
19 +h / +help
20 help page and usage
21
22Examples:
23 ./examples/dice.out +n 2 +m 6 \
24 +l "=> " +v
25
26 ./examples/dice.out +rolls 2 +sides 6 \
27 +label "=> " +verbose
With usage, options and examples:
1static void help(const char *pname, const Six *h) {
2 usage(pname, h);
3 size_t len = strlen(pname);
4 printf("\nOption:\n");
5 for (size_t j = 0; j < h->flag_count; j++) {
6 print_flag(&h->flags[j], true);
7 }
8 print_flag(&HELP_FLAG, true);
9
10 printf("Examples: ");
11 for (size_t i = 0; i < 2; i++) {
12 printf("\n\t%s ", pname);
13 for (size_t j = 0; j < h->flag_count; j++) {
14 SixFlag *s = &h->flags[j];
15 if (i) {
16 printf("%c%s", SIX_OPTION_PREFIX, s->name);
17 } else {
18 printf("%c%c", SIX_OPTION_PREFIX, s->short_name);
19 }
20 switch (s->type) {
21 case SIX_STR:
22 printf(" \"%s\"", s->s);
23 break;
24 case SIX_CHAR:
25 printf(" %c", s->c);
26 break;
27 case SIX_INT:
28 printf(" %d", s->i);
29 break;
30 case SIX_LONG:
31 printf(" %zu", s->l);
32 break;
33 case SIX_FLOAT:
34 case SIX_DOUBLE:
35 printf(" %g", s->f);
36 break;
37 case SIX_BOOL:
38 default:
39 break;
40 }
41 putc(' ', stdout);
42 if ((j + 1) % 2 == 0 && j + 1 < h->flag_count) {
43 printf("\\\n\t %*.s", (int)len, "");
44 }
45 }
46 puts("");
47 }
48}
Detecting Short Flags
Since there are less than 256 ascii values that are valid for a short option,
specifically the character omitting the prefix, I use a table lookup for
checking both if there is a flag registered for that character and at what
location the option is in Six.flags
.
1short table_short[256] = {0};
2
3// registering all options
4for (size_t i = 0; i < six->flag_count; i++) {
5 SixFlag *f = &six->flags[i];
6
7 // [...]
8 if (f->short_name) {
9 table_short[(int)f->short_name] = i + 1;
10 }
11}
Zeroing the array serves the purpose of treating all characters that don’t have
an associated option to resolve to 0, making for a pretty good error handling.
However, this requires incrementing all indices by one to differentiate from
the zero value (I could’ve abstracted this via a custom Option
struct or
something, but I couldn’t be bothered).
For our
+p
example we have the index 1(0) into the option array at the table index112
, since we increment the index by one, as explained above.
Detecting short options is a thing of indexing a character into an array, so we do exactly that while processing the arguments:
1for (size_t i = 1; i < argc; i++) {
2 SixStr arg_cur = (SixStr){.p = (argv[i]), .len = strnlen(argv[i], 256)};
3
4 // not starting with PREFIX means: no option, thus rest
5 if (arg_cur.p[0] != SIX_OPTION_PREFIX) {
6 if (six->rest_count + 1 >= SIX_MAX_REST) {
7 fprintf(stderr, "Not enough space left for more rest arguments\n");
8 goto err;
9 }
10 six->rest[six->rest_count++] = argv[i];
11 continue;
12 }
13
14 // check if short option
15 if (arg_cur.len == 2) {
16 int cc = arg_cur.p[1];
17 if (cc > 256 || cc < 0) {
18 fprintf(stderr, "Unknown short option '%c'\n", arg_cur.p[1]);
19 goto err;
20 }
21
22 // single char option usage/help page
23 if (cc == 'h') {
24 help(argv[0], six);
25 exit(EXIT_SUCCESS);
26 }
27
28 // check if short option is a registered one
29 short option_idx = table_short[(short)arg_cur.p[1]];
30 if (!option_idx) {
31 fprintf(stderr, "Unknown short option '%c'\n", arg_cur.p[1]);
32 goto err;
33 }
34
35 // we decrement option_idx, since we zero the lookup table, thus an
36 // empty value is 0 and the index of the first option is 1, we correct
37 // this here
38 option_idx--;
39
40 int offset = process_argument(&six->flags[option_idx], i, argc, argv);
41 if (offset == -1) {
42 goto err;
43 }
44 i += offset;
45 } else {
46 // [...]
47 }
48}
The tricky parts have comments, the rest should be obvious:
- check if the current argv member is a short option
- if
h
: print the help page, end
- if
- check if option is in the registered table
- process the argument for said option
- modify index with offset returned from parsing arguments (because an option argument can span multiple process arguments)
Detecting Long Flags
Long flags is a whole other story since we can’t match on a single character, we cant hardcode a switch or an if since that would defeat the dynamic nature of 6cl - the solution:
Hashing
If we hash all flags at register time, use that hash to index into a table,
store a pointer to the corresponding option at said hash/index, hash the flags
we encounter while parsing the command arguments and use this hash to index
into the table - we have implemented a lookup table that allows us to keep
things dynamic without the consumer having to do any work with strncmp
or
macro code gen.
Hash Algorithm
Good old fnv1a. I used it in HashMap in 25 lines of C and I use it in purple garden for interning strings and identifiers - its fast, has a good distribution and is easy to implement:
1static size_t fnv1a(const char *str, size_t len) {
2#define FNV_OFFSET_BASIS 0x811c9dc5
3#define FNV_PRIME 0x01000193
4
5 size_t hash = FNV_OFFSET_BASIS;
6 for (size_t i = 0; i < len; i++) {
7 hash ^= str[i];
8 hash *= FNV_PRIME;
9 }
10
11 return hash;
12}
Registering Options
As with the short options, we must first register the long options by their name, or rather, by their hash:
1// maps a strings hash to its index into the option array
2short hash_table_long[__HASH_TABLE_SIZE] = {0};
3
4for (size_t i = 0; i < six->flag_count; i++) {
5 SixFlag *f = &six->flags[i];
6
7 // we increment the index by one here, since we use all tables and arrays
8 // zero indexed, distinguishing between a not found and the option at index
9 // 0 is therefore clear
10 hash_table_long[fnv1a(f->name, strnlen(f->name, 256)) & __HASH_TABLE_MASK] = i + 1;
11
12 // [...]
13}
The & __HASH_TABLE_MASK
makes sure we truncate our hashes to the table size:
1#define __HASH_TABLE_SIZE 512
2#define __HASH_TABLE_MASK (__HASH_TABLE_SIZE - 1)
We use this to now compute the hash for each long option we encounter and check if the table contains an option index:
Detecting Options by Their Names
As introduced before, we enter this path if the current argument isn’t two
characters: <PREFIX><char>
, but longer:
- Modify the string window to skip the char prefix:
- from:
+string
(start=0,length=7) - to:
string
(start=1,length=6)
- from:
- Check if the window matches
help
, print help and exit if so - Compute the hash for the window
- Check if hash is in registered option table
- Process arguments
1for (size_t i = 1; i < argc; i++) {
2 SixStr arg_cur = (SixStr){.p = (argv[i]), .len = strnlen(argv[i], 256)};
3
4 // check if short option
5 if (arg_cur.len == 2) {
6 // [..]
7 } else {
8 // strip first char by moving the start of the window one to the right
9 arg_cur.p++;
10 arg_cur.len--;
11
12 // long help page with option description and stuff
13 if (strncmp(arg_cur.p, help_str.p, help_str.len) == 0) {
14 help(argv[0], six);
15 exit(EXIT_SUCCESS);
16 }
17
18 size_t idx = hash_table_long[fnv1a(arg_cur.p, arg_cur.len) & __HASH_TABLE_MASK];
19 if (!idx) {
20 fprintf(stderr, "Unknown option '%*s'\n", (int)arg_cur.len, arg_cur.p);
21 goto err;
22 }
23
24 // decrement idx since we use 0 as the no option value
25 idx--;
26
27 SixFlag *f = &six->flags[idx];
28 int offset = process_argument(f, i, argc, argv);
29 if (offset == -1) {
30 goto err;
31 }
32 i += offset;
33 }
34}
Handling Option Arguments
Handling arguments is fairly easy, its just a big switch, a lot of parsing values from strings to other things and validating the results of said parsing:
1static int process_argument(SixFlag *f, size_t cur, size_t argc, char **argv) {
2 size_t offset = 1;
3 switch (f->type) {
4 case SIX_STR: {
5 if (cur + 1 >= argc) {
6 fprintf(stderr, "No STRING value for option '%s'\n", f->name);
7 return -1;
8 }
9 f->s = argv[cur + 1];
10 break;
11 }
12 case SIX_BOOL:
13 f->b = true;
14 offset = 0;
15 break;
16 case SIX_CHAR:
17 if (cur + 1 >= argc) {
18 fprintf(stderr, "No char value found for option '%s/%c'\n", f->name,
19 f->short_name);
20 return -1;
21 } else if (argv[cur + 1][0] == '\0') {
22 fprintf(stderr, "No char found for option '%s/%c', empty argument\n",
23 f->name, f->short_name);
24 return -1;
25 } else if (argv[cur + 1][1] != '\0') {
26 fprintf(stderr,
27 "'%s/%c' value has too many characters, want one for type CHAR\n",
28 f->name, f->short_name);
29 return -1;
30 }
31 f->c = argv[cur + 1][0];
32 break;
33 case SIX_INT: {
34 if (cur + 1 >= argc) {
35 fprintf(stderr, "No INT value for option '%s/%c'\n", f->name,
36 f->short_name);
37 return -1;
38 }
39 char *tmp = argv[cur + 1];
40 char *endptr = NULL;
41 int errno = 0;
42 long val = strtol(tmp, &endptr, 10);
43
44 if (endptr == tmp || *endptr != '\0') {
45 fprintf(stderr, "Invalid integer for option '%s/%c': '%s'\n", f->name,
46 f->short_name, tmp);
47 return -1;
48 }
49
50 if (val < INT_MIN || val > INT_MAX) {
51 fprintf(stderr, "Integer out of range for option '%s/%c': %ld\n", f->name,
52 f->short_name, val);
53 return -1;
54 }
55
56 f->i = (int)val;
57 break;
58 }
59 case SIX_LONG: {
60 if (cur + 1 >= argc) {
61 fprintf(stderr, "No LONG value for option '%s/%c'\n", f->name,
62 f->short_name);
63 return -1;
64 }
65 char *tmp = argv[cur + 1];
66 char *endptr = NULL;
67 int errno = 0;
68 long val = strtol(tmp, &endptr, 10);
69
70 if (endptr == tmp || *endptr != '\0') {
71 fprintf(stderr, "Invalid LONG integer for option '%s/%c': '%s'\n",
72 f->name, f->short_name, tmp);
73 return -1;
74 }
75
76 if (val < LONG_MIN || val > LONG_MAX) {
77 fprintf(stderr, "LONG integer out of range for option '%s/%c': %ld\n",
78 f->name, f->short_name, val);
79 return -1;
80 }
81
82 f->l = val;
83 break;
84 }
85 case SIX_FLOAT: {
86 if (cur + 1 >= argc) {
87 fprintf(stderr, "No FLOAT value for option '%s/%c'\n", f->name,
88 f->short_name);
89 return -1;
90 }
91 char *tmp = argv[cur + 1];
92 char *endptr = NULL;
93 int errno = 0;
94 float val = strtof(tmp, &endptr);
95
96 if (endptr == tmp || *endptr != '\0') {
97 fprintf(stderr, "Invalid FLOAT for option '%s/%c': '%s'\n", f->name,
98 f->short_name, tmp);
99 return -1;
100 }
101
102 if (val < FLT_MIN || val > FLT_MAX) {
103 fprintf(stderr, "FLOAT out of range for option '%s/%c': %ld\n", f->name,
104 f->short_name, val);
105 return -1;
106 }
107
108 f->f = val;
109 break;
110 }
111 case SIX_DOUBLE: {
112 if (cur + 1 >= argc) {
113 fprintf(stderr, "No DOUBLE value for option '%s/%c'\n", f->name,
114 f->short_name);
115 return -1;
116 }
117 char *tmp = argv[cur + 1];
118 char *endptr = NULL;
119 int errno = 0;
120 double val = strtod(tmp, &endptr);
121
122 if (endptr == tmp || *endptr != '\0') {
123 fprintf(stderr, "Invalid DOUBLE for option '%s/%c': '%s'\n", f->name,
124 f->short_name, tmp);
125 return -1;
126 }
127
128 if (val < FLT_MIN || val > FLT_MAX) {
129 fprintf(stderr, "DOUBLE out of range for option '%s/%c': %ld\n", f->name,
130 f->short_name, val);
131 return -1;
132 }
133
134 f->d = val;
135 break;
136 }
137 default:
138 fprintf(stderr, "Unknown type for option '%s/%c'\n", f->name,
139 f->short_name);
140 return -1;
141 }
142
143 return offset;
144}
By default the returned offset is one, since we handle one argument per
option. The exception being SixFlag::type=SIX_BOOL
, because I decided i
don’t allow arguments for boolean options.
Porting Purple Garden from Getopt to 6cl
Since I wrote this library to solve my issues with getopt
, I introduced it
with and for that purpose and I used the interpreter as an example - I ought to
show you how I used 6cl to fix these issues:
1diff --git a/Makefile b/Makefile
2index 5b75a9c..3c333fc 100644
3--- a/Makefile
4+++ b/Makefile
5@@ -44,14 +44,14 @@ run:
6
7 verbose:
8 $(CC) -g3 $(FLAGS) $(RELEASE_FLAGS) $(FILES) ./main.c -o purple_garden_verbose
9- ./purple_garden_verbose -V $(PG)
10+ ./purple_garden_verbose +V $(PG)
11
12 release:
13 $(CC) -g3 $(FLAGS) $(RELEASE_FLAGS) -DCOMMIT='"$(COMMIT)"' -DCOMMIT_MSG='"$(COMMIT_MSG)"' $(FILES) ./main.c -o purple_garden
14
15 bench:
16 $(CC) $(FLAGS) $(RELEASE_FLAGS) -DCOMMIT='"BENCH"' $(FILES) ./main.c -o bench
17- ./bench -V $(PG)
18+ ./bench +V $(PG)
19
20 test:
21 $(CC) $(FLAGS) -g3 -fsanitize=address,undefined -DDEBUG=0 $(TEST_FILES) $(FILES) -o ./tests/test
22diff --git a/main.c b/main.c
23index b372404..dedb6c2 100644
24--- a/main.c
25+++ b/main.c
26@@ -1,9 +1,10 @@
27-#include <getopt.h>
28+// TODO: split this up into a DEBUG and a performance entry point
29 #include <stdio.h>
30 #include <stdlib.h>
31 #include <sys/mman.h>
32 #include <sys/time.h>
33
34+#include "6cl/6cl.h"
35 #include "cc.h"
36 #include "common.h"
37 #include "io.h"
38@@ -36,158 +37,99 @@
39 } while (0)
40
41 typedef struct {
42- // options - int because getopt has no bool support
43-
44- // use block allocator instead of garbage collection
45 size_t block_allocator;
46- // compile all functions to machine code
47- int aot_functions;
48- // readable bytecode representation with labels, globals and comments
49- int disassemble;
50- // display the memory usage of parsing, compilation and the virtual machine
51- int memory_usage;
52-
53- // executes the argument as if an input file was given
54+ bool aot_functions;
55+ bool disassemble;
56+ bool memory_usage;
57 char *run;
58-
59- // verbose logging
60- int verbose;
61-
62- // options in which we exit after toggle
63+ bool verbose;
64 int version;
65- int help;
66-
67- // entry point - last argument thats not an option
68 char *filename;
69 } Args;
70
71-typedef struct {
72- const char *name_long;
73- const char name_short;
74- const char *description;
75- const char *arg_name;
76-} cli_option;
77-
78-// WARN: DO NOT REORDER THIS - will result in option handling issues
79-static const cli_option options[] = {
80- {"version", 'v', "display version information", ""},
81- {"help", 'h', "extended usage information", ""},
82- {"disassemble", 'd',
83- "readable bytecode representation with labels, globals and comments", ""},
84- {"block-allocator", 'b',
85- "use block allocator with size instead of garbage collection",
86- "<size in Kb>"},
87- {"aot-functions", 'a', "compile all functions to machine code", ""},
88- {"memory-usage", 'm',
89- "display the memory usage of parsing, compilation and the virtual "
90- "machine",
91- ""},
92- {"verbose", 'V', "verbose logging", ""},
93- {"run", 'r', "executes the argument as if an input file was given",
94- "<input>"},
95-};
96-
97-void usage() {
98- Str prefix = STRING("usage: purple_garden");
99- printf("%.*s ", (int)prefix.len, prefix.p);
100- size_t len = sizeof(options) / sizeof(cli_option);
101- for (size_t i = 0; i < len; i++) {
102- const char *equal_or_not = options[i].arg_name[0] == 0 ? "" : "=";
103- const char *name_or_not =
104- options[i].arg_name[0] == 0 ? "" : options[i].arg_name;
105- printf("[-%c%s | --%s%s%s] ", options[i].name_short, name_or_not,
106- options[i].name_long, equal_or_not, name_or_not);
107- if ((i + 1) % 2 == 0 && i + 1 < len) {
108- printf("\n%*.s ", (int)prefix.len, "");
109- }
110- }
111- printf("<file.garden>\n");
112-}
113-
114-// TODO: replace this shit with `6cl` - the purple garden and 6wm arguments
115-// parser
116 Args Args_parse(int argc, char **argv) {
117- Args a = (Args){0};
118- // MUST be in sync with options, otherwise this will not work as intended
119- struct option long_options[] = {
120- {options[0].name_long, no_argument, &a.version, 1},
121- {options[1].name_long, no_argument, &a.help, 1},
122- {options[2].name_long, no_argument, &a.disassemble, 1},
123- {options[3].name_long, required_argument, 0, 'b'},
124- {options[4].name_long, no_argument, &a.aot_functions, 1},
125- {options[5].name_long, no_argument, &a.memory_usage, 1},
126- {options[6].name_long, no_argument, &a.verbose, 1},
127- {options[7].name_long, required_argument, 0, 'r'},
128- {0, 0, 0, 0},
129+ enum {
130+ __VERSION,
131+ __DISASSEMBLE,
132+ __BLOCK_ALLOC,
133+ __AOT,
134+ __MEMORY_USAGE,
135+ __VERBOSE,
136+ __RUN,
137 };
138
139- int opt;
140- while ((opt = getopt_long(argc, argv, "vhdb:amVr:", long_options, NULL)) !=
141- -1) {
142- switch (opt) {
143- case 'v':
144- a.version = 1;
145- break;
146- case 'V':
147- a.verbose = 1;
148- break;
149- case 'h':
150- a.help = 1;
151- break;
152- case 'd':
153- a.disassemble = 1;
154- break;
155- case 'r':
156- a.run = optarg;
157- break;
158- case 'b':
159- char *endptr;
160- size_t block_size = strtol(optarg, &endptr, 10);
161- ASSERT(endptr != optarg, "args: Failed to parse number from: %s", optarg);
162- a.block_allocator = block_size;
163- break;
164- case 'a':
165- a.aot_functions = 1;
166- break;
167- case 'm':
168- a.memory_usage = 1;
169- break;
170- case 0:
171- break;
172- default:
173- usage();
174- exit(EXIT_FAILURE);
175- }
176- }
177-
178- if (optind < argc) {
179- a.filename = argv[optind];
180+ SixFlag options[] = {
181+ [__VERSION] = {.name = "version",
182+ .type = SIX_BOOL,
183+ .b = false,
184+ .short_name = 'v',
185+ .description = "display version information"},
186+ [__DISASSEMBLE] =
187+ {.name = "disassemble",
188+ .short_name = 'd',
189+ .type = SIX_BOOL,
190+ .b = false,
191+ .description =
192+ "readable bytecode representation with labels, globals "
193+ "and comments"},
194+ [__BLOCK_ALLOC] =
195+ {.name = "block-allocator",
196+ .short_name = 'b',
197+ .type = SIX_LONG,
198+ .description =
199+ "use block allocator with size instead of garbage collection"},
200+ [__AOT] = {.name = "aot-functions",
201+ .short_name = 'a',
202+ .b = false,
203+ .type = SIX_BOOL,
204+ .description = "compile all functions to machine code"},
205+ [__MEMORY_USAGE] = {.name = "memory-usage",
206+ .short_name = 'm',
207+ .b = false,
208+ .type = SIX_BOOL,
209+ .description = "display the memory usage of parsing, "
210+ "compilation and the virtual "
211+ "machine"},
212+ [__VERBOSE] = {.name = "verbose",
213+ .short_name = 'V',
214+ .b = false,
215+ .type = SIX_BOOL,
216+ .description = "verbose logging"},
217+ [__RUN] = {.name = "run",
218+ .short_name = 'r',
219+ .s = "",
220+ .type = SIX_STR,
221+ .description =
222+ "executes the argument as if an input file was given"},
223+ };
224+ Args a = (Args){0};
225+ Six s = {
226+ .flags = options,
227+ .flag_count = sizeof(options) / sizeof(options[0]),
228+ .name_for_rest_arguments = "<file.garden>",
229+ };
230+ SixParse(&s, argc, argv);
231+ if (s.rest_count) {
232+ a.filename = s.rest[0];
233 }
234+ a.block_allocator = s.flags[__BLOCK_ALLOC].l;
235+ a.aot_functions = s.flags[__AOT].b;
236+ a.disassemble = s.flags[__DISASSEMBLE].b;
237+ a.memory_usage = s.flags[__MEMORY_USAGE].b;
238+ a.run = s.flags[__RUN].s;
239+ a.verbose = s.flags[__VERBOSE].b;
240+ a.version = s.flags[__VERSION].b;
241
242 // command handling
243- if (UNLIKELY(a.version)) {
244+ if (a.version) {
245 printf("purple_garden: %s-%s-%s\n", CTX, VERSION, COMMIT);
246 if (UNLIKELY(a.verbose)) {
247 puts(COMMIT_MSG);
248 }
249 exit(EXIT_SUCCESS);
250- } else if (UNLIKELY(a.help)) {
251- usage();
252- size_t len = sizeof(options) / sizeof(cli_option);
253- printf("\nOptions:\n");
254- for (size_t i = 0; i < len; i++) {
255- const char *equal_or_not = options[i].arg_name[0] == 0 ? "" : "=";
256- const char *name_or_not =
257- options[i].arg_name[0] == 0 ? "" : options[i].arg_name;
258- printf("\t-%c%s%s, --%s%s%s\n\t\t%s\n\n", options[i].name_short,
259- equal_or_not, name_or_not, options[i].name_long, equal_or_not,
260- name_or_not, options[i].description);
261- }
262- exit(EXIT_SUCCESS);
263 }
264
265- if (UNLIKELY(a.filename == NULL && a.run == NULL)) {
266- usage();
267+ if (a.filename == NULL && (a.run == NULL || a.run[0] == 0)) {
268 fprintf(stderr, "error: Missing a file? try `-h/--help`\n");
269 exit(EXIT_FAILURE);
270 };
271@@ -198,13 +140,14 @@ Args Args_parse(int argc, char **argv) {
272 int main(int argc, char **argv) {
273 struct timeval start_time, end_time;
274 Args a = Args_parse(argc, argv);
275+
276 if (UNLIKELY(a.verbose)) {
277 gettimeofday(&start_time, NULL);
278 }
279 VERBOSE_PUTS("main::Args_parse: Parsed arguments");
280
281 Str input;
282- if (a.run != NULL) {
283+ if (a.run != NULL && a.run[0] != 0) {
284 input = (Str){.p = a.run, .len = strlen(a.run)};
285 } else {
286 input = IO_read_file_to_string(a.filename);
I am going to keep it real at this point, this doesn’t feel as ergonomic as I hoped. I still have to define an enum to make the option array order agnostic, I have to define fields on a struct and i have to fill these fields on my own.
A better way would be a macro to generate the struct and the implementation from a single source, but I’ll keep it like this for now, the rest works great and feels even better to maintain, especially the nice help page:
1$ ./purple_garden +h
2usage ./purple_garden: [ +v / +version] [ +d / +disassemble]
3 [ +b / +block-allocator <long=0>] [ +a / +aot-functions]
4 [ +m / +memory-usage] [ +V / +verbose]
5 [ +r / +run <string=``>]
6 [ +h / +help] <file.garden>
7
8Option:
9 +v / +version
10 display version information
11
12 +d / +disassemble
13 readable bytecode representation with labels, globals and comments
14
15 +b / +block-allocator <long=0>
16 use block allocator with size instead of garbage collection
17
18 +a / +aot-functions
19 compile all functions to machine code
20
21 +m / +memory-usage
22 display the memory usage of parsing, compilation and the virtual machine
23
24 +V / +verbose
25 verbose logging
26
27 +r / +run <string=``>
28 executes the argument as if an input file was given
29
30 +h / +help
31 help page and usage
32
33Examples:
34 ./purple_garden +v +d \
35 +b 0 +a \
36 +m +V \
37 +r ""
38
39 ./purple_garden +version +disassemble \
40 +block-allocator 0 +aot-functions \
41 +memory-usage +verbose \
42 +run ""
Extra: Ultra complicated error handling
You probably noticed a lot of goto err;
for all non happy path endings.
1err:
2 usage(argv[0], six);
3 exit(EXIT_FAILURE);
4 return;
Since we do not do any heap allocations we don’t have to clean anything up, the return is just for good measure.