Rethinking How I Deal With CLI Arguments (replacing getopt)

Table of Contents

Tags:

This article covers the issues I found myself having while using getopt.h for a programming language bytecode interpreter pipeline I am currently working on (purple-garden1), implementation details for the replacement I wrote, examples of how to use the new solution and how I replaced the previous implementation with the new one using 6cl2.

Tip or: TL;DR

getopt.h kinda sucks to use, because:

  • You have to write the help page and the usage page yourself
  • You have to keep track of all flags and their handling at least twice (once in your flag definitions, once for short option names and once in their handling)
  • Flags aren’t typed, so all option arguments are just C strings accessible via global state optarg
  • Default values aren’t really a thing
  • Option prefix is hardcoded to - for short options and -- for long options (thanks for that UNIX and POSIX)

I solved these issues by:

  • Generating help and usage pages from the list of options
  • Single flag list, no short and long definition in multiple places
  • Typing flag options with seven types, range validation (float.h, limits.h) and default values:
    • string
    • boolean
    • character
    • integer and long
    • float and double
  • Defaults and types are just fields of the flag struct in the flag definition
  • Macro for setting the prefix and single prefix for all options

I wrote 6cl to fill the gaps I encountered. 6cl is an opiniated command line option parsing library for 6wm3 and purple garden

Warning

As always, I am not perfect and neither are my code and blog articles, thus if you find any issues or have any questions, feel free to email me: contact@xnacly.me or contact (at) xnacly.me.

Current state

The purple-garden (pg), interpreter accepts a number of options:

TEXT
 1$ ./purple_garden -h
 2usage: purple_garden [-v | --version] [-h | --help]
 3                     [-d | --disassemble] [-b<size> | --block-allocator=<size>]
 4                     [-a | --aot-functions] [-m | --memory-usage]
 5                     [-V | --verbose] [-r<input> | --run=<input>] <file.garden>
 6
 7Options:
 8        -v, --version
 9                display version information
10
11        -h, --help
12                extended usage information
13
14        -d, --disassemble
15                readable bytecode representation with labels,
16                globals and comments
17
18        -b=<size>, --block-allocator=<size>
19                use block allocator instead of garbage collection
20
21        -a, --aot-functions
22                compile all functions to machine code
23
24        -m, --memory-usage
25                display the memory usage of parsing,
26                compilation and the virtual machine
27
28        -V, --verbose
29                verbose logging
30
31        -r=<input>, --run=<input>
32                executes the argument as if an input file was given

Command line argument parsing using getopt

I handle these by first defining a struct to hold all options, for later reference:

C
 1typedef struct {
 2  // options - int because getopt has no bool support
 3
 4  // use block allocator instead of garbage collection
 5  size_t block_allocator;
 6  // compile all functions to machine code
 7  int aot_functions;
 8  // readable bytecode representation with labels, globals and comments
 9  int disassemble;
10  // display the memory usage of parsing, compilation and the virtual machine
11  int memory_usage;
12
13  // executes the argument as if an input file was given
14  char *run;
15
16  // verbose logging
17  int verbose;
18
19  // options in which we exit after toggle
20  int version;
21  int help;
22
23  // entry point - last argument thats not an option
24  char *filename;
25} Args;

After that I define the list of options so I can keep track of them once getopt does its stuff:

C
 1typedef struct {
 2  const char *name_long;
 3  const char name_short;
 4  const char *description;
 5  const char *arg_name;
 6} cli_option;
 7
 8// WARN: DO NOT REORDER THIS - will result in option handling issues
 9static const cli_option options[] = {
10    {"version", 'v', "display version information", ""},
11    {"help", 'h', "extended usage information", ""},
12    {"disassemble", 'd', "readable bytecode representation with labels, globals and comments", ""},
13    {"block-allocator", 'b', "use block allocator instead of garbage collection", "<size>"},
14    {"aot-functions", 'a', "compile all functions to machine code", ""},
15    {"memory-usage", 'm', "display the memory usage of parsing, compilation and the virtual machine", ""},
16    {"verbose", 'V', "verbose logging", ""},
17    {"run", 'r', "executes the argument as if an input file was given", "<input>"},
18};

The heavy lifting is of course done in the Args_parse function:

C
1Args Args_parse(int argc, char **argv) {
2    // [...] 1.-6.
3}
  1. Convert the array of cli_option’s to getopt’s option

    C
     1    Args a = (Args){0};
     2    // MUST be in sync with options, otherwise this will not work as intended
     3    struct option long_options[] = {
     4      {options[0].name_long, no_argument, &a.version, 1},
     5      {options[1].name_long, no_argument, &a.help, 1},
     6      {options[2].name_long, no_argument, &a.disassemble, 1},
     7      {options[3].name_long, required_argument, 0, 'b'},
     8      {options[4].name_long, no_argument, &a.aot_functions, 1},
     9      {options[5].name_long, no_argument, &a.memory_usage, 1},
    10      {options[6].name_long, no_argument, &a.verbose, 1},
    11      {options[7].name_long, required_argument, 0, 'r'},
    12      {0, 0, 0, 0},
    13    };
  2. Pass the array to getopt_long with the matching short flag definition vhdb:amVr: (third location to define flags)

    C
    1    int opt;
    2    while ((opt = getopt_long(argc, argv, "vhdb:amVr:", long_options, NULL)) !=
    3         -1) {
    4        //  [...]
    5    }
  3. Handle short options separately from long option “automatic” handling (touching every flag twice)

    C
     1        switch (opt) {
     2        case 'v':
     3          a.version = 1;
     4          break;
     5        case 'V':
     6          a.verbose = 1;
     7          break;
     8        case 'h':
     9          a.help = 1;
    10          break;
    11        case 'd':
    12          a.disassemble = 1;
    13          break;
    14        case 'r':
    15          a.run = optarg;
    16          break;
    17        case 'b':
    18          char *endptr;
    19          size_t block_size = strtol(optarg, &endptr, 10);
    20          ASSERT(endptr != optarg, "args: Failed to parse number from: %s", optarg);
    21          a.block_allocator = block_size;
    22          break;
    23        case 'a':
    24          a.aot_functions = 1;
    25          break;
    26        case 'm':
    27          a.memory_usage = 1;
    28          break;
    29        case 0:
    30          break;
    31        default:
    32          usage();
    33          exit(EXIT_FAILURE);
    34        }
  4. Store all non flags as rest, representing the entry file (filename)

    C
    1    if (optind < argc) {
    2        a.filename = argv[optind];
    3    }
  5. Act on commands, like --version, --help and their short variants

    C
    1    if (a.version) {
    2        // [...]
    3    } else if (a.help) {
    4        usage();
    5        // [...]
    6    }
  6. Error if no input to the interpreter is detected

    C
    1  if (a.filename == NULL && a.run == NULL) {
    2    usage();
    3    fprintf(stderr, "error: Missing a file? try `-h/--help`\n");
    4    exit(EXIT_FAILURE);
    5  };

6cl Design, API and Examplary Usage

The API design is inspired by Go’s flag package, Google’s gflag, my general experience with programming languages (Go, Rust, C, etc.) and my attempt to create an ergonomic interface around the constraints of the C programming language.

By ergnomic I mean:

  • single location for defining flags (no setting and handling them multiple times)
  • option format
    • boolean options have no argument
    • merged short and long options (no -s, --save, but -s and -save)
    • no combined options (no -xvf or -lah, but -x -v -f and -l -a -h)
    • no name and option merges, such as +DCONSTANT=12 or +n128, but rather +D CONSTANT=12 and +n 128
  • type safe and early errors if types can’t be to parsed or over-/underflows occur
  • type safe default values if flags aren’t specified

Defining Flags

A flag consists of a long name, a short name, a type, a description and a default value that matches its type.

The type is defined as an enum:

C
1typedef enum {
2  SIX_STR,
3  SIX_BOOL,
4  SIX_CHAR,
5  SIX_INT,
6  SIX_LONG,
7  SIX_FLOAT,
8  SIX_DOUBLE,
9} SixFlagType;

The flag itself holds all aforementioned fields:

C
 1typedef struct {
 2  // name of the flag, for instance +<name>; +help
 3  const char *name;
 4  // short name, like +<short_name>; +h
 5  char short_name;
 6  // Defines the datatype
 7  SixFlagType type;
 8  // used in the help page
 9  const char *description;
10
11  // typed result values, will be filled with the value if any is found found
12  // for the option, or with the default value thats already set.
13  union {
14    // string value
15    char *s;
16    // boolean value
17    bool b;
18    // char value
19    char c;
20    // int value
21    int i;
22    // long value
23    long l;
24    // float value
25    float f;
26    // double value
27    double d;
28  };
29} SixFlag;

So a flag +pi <double> / +p <double> would be defined as:

C
1SixFlag pi = {
2    .name = "pi",
3    .short_name = 'p',
4    .d = 3.1415,
5    .type = SIX_DOUBLE,
6    .description = "define pi",
7};

This has to be passed to the Six struct, holding the available flags:

C
1typedef struct Six {
2  SixFlag *flags;
3  size_t flag_count;
4  // usage will be postfixed with this
5  const char *name_for_rest_arguments;
6  // rest holds all arguments not matching any defined options
7  char *rest[SIX_MAX_REST];
8  size_t rest_count;
9} Six;

The fields flags and flag_count must be set before calling SixParse:

C
 1typedef enum { UNKNOWN = -1, PI} Option;
 2
 3SixFlag options[] = {
 4    [PI] = {
 5        .name = "pi",
 6        .short_name = 'p',
 7        .d = 3.1415,
 8        .type = SIX_DOUBLE,
 9        .description = "define pi",
10    },
11};
12Six s = {0};
13s.flags = options;
14s.flag_count = sizeof(options) / sizeof(SixFlag);
15SixParse(&s, argc, argv);

Accessing Flags

The flags can be accessed by indexing into the options array:

C
1double pi = s.flags[PI].d;
2printf("%f\n", pi);

Fusing into an Example

I use an example to test the pipeline, so I’ll just dump this one here:

C
 1/*
 2 * A dice roller that simulates rolling N dice with M sides, optionally
 3 * labeled, and with verbose output to print each roll result.
 4 *
 5 * $ gcc ./dice.c ../6cl.c -o dice
 6 * $ ./dice +n 4 +m 6
 7 * => 14
 8 * $ ./dice +rolls 2 +sides 20 +label "STR"
 9 * STR: => 29
10 * $ ./dice +n 3 +m 10 +v
11 * Rolled: 3 + 7 + 5 =15
12 */
13#include "../6cl.h"
14
15#include <assert.h>
16#include <stdio.h>
17#include <stdlib.h>
18#include <time.h>
19
20#define ERR(FMT, ...) fprintf(stderr, "dice: " FMT "\n", ##__VA_ARGS__);
21
22void dice(int *throws, unsigned int n, unsigned int m) {
23  for (size_t i = 0; i < n; i++) {
24    throws[i] = (rand() % m) + 1;
25  }
26}
27
28typedef enum { UNKNOWN = -1, ROLLS, SIDES, LABEL, VERBOSE } Option;
29
30int main(int argc, char **argv) {
31  srand((unsigned int)time(NULL));
32
33  SixFlag options[] = {
34      [ROLLS] = {.name = "rolls",
35                 .short_name = 'n',
36                 .i = 2,
37                 .type = SIX_INT,
38                 .description = "times to roll"},
39      [SIDES] = {.name = "sides",
40                 .short_name = 'm',
41                 .i = 6,
42                 .type = SIX_INT,
43                 .description = "sides the dice has"},
44      [LABEL] =
45          {
46              .name = "label",
47              .short_name = 'l',
48              .s = "=> ",
49              .type = SIX_STR,
50              .description = "prefix for the dice roll result",
51          },
52      [VERBOSE] =
53          {
54              .name = "verbose",
55              .short_name = 'v',
56              .type = SIX_BOOL,
57              .description = "print all rolls, not only the result",
58          },
59  };
60  Six s = {0};
61  s.flags = options;
62  s.flag_count = sizeof(options) / sizeof(SixFlag);
63
64  SixParse(&s, argc, argv);
65  if (s.flags[VERBOSE].b) {
66    printf("Config{rolls=%d, sides=%d, label=`%s`}\n", s.flags[ROLLS].i,
67           s.flags[SIDES].i, s.flags[LABEL].s);
68  }
69
70  if (options[ROLLS].i < 1) {
71    ERR("Rolls can't be < 1");
72    return EXIT_FAILURE;
73  }
74
75  int throws[options[ROLLS].i];
76  dice(throws, options[ROLLS].i, options[SIDES].i);
77
78  int cum = 0;
79  for (int i = 0; i < options[ROLLS].i; i++) {
80    int roll = throws[i];
81    cum += roll;
82    if (options[VERBOSE].b) {
83      printf("[roll=%02d]::[%02d/%02d]\n", i + 1, roll, options[SIDES].i);
84    }
85  }
86
87  printf("%s%d\n", options[LABEL].s, cum);
88
89  return EXIT_SUCCESS;
90}

Generating Documentation

From here on out, I’ll show how I implemented the command line parser and the API surface.

If the user passes a malformed input to a well written application, it should provide a good error message, a usage overview and a note on how to get in depth help. Since each option has a short name, a long name, a type, a default value and a description - I want to display all of the aforementioned in the help and a subset in the usage page.

Usage

The usage page is displayed if the application is invoked with either +h or +help or the 6cl parser hits an error (for the former two just as a prefix to the help page):

TEXT
1$ ./examples/dice.out +k
2Unknown short option 'k'
3usage ./examples/dice.out: [ +n / +rolls <int=2>] [ +m / +sides <int=6>]
4                           [ +l / +label <string=`=> `>] [ +v / +verbose]
5                           [ +h / +help]

I created a helper for printing a flag and all its options - print_flag:

C
 1void print_flag(SixFlag *f, bool long_option) {
 2  char *pre_and_postfix = "[]";
 3  if (long_option) {
 4    putc('\t', stdout);
 5    pre_and_postfix = "  ";
 6  }
 7
 8  printf("%c %c%c / %c%s", pre_and_postfix[0], SIX_OPTION_PREFIX, f->short_name,
 9         SIX_OPTION_PREFIX, f->name);
10  if (f->type != SIX_BOOL) {
11    printf(" <%s=", SIX_FLAG_TYPE_TO_MAP[f->type]);
12    switch (f->type) {
13    case SIX_STR:
14      printf("`%s`", f->s);
15      break;
16    case SIX_CHAR:
17      putc(f->c, stdout);
18      break;
19    case SIX_INT:
20      printf("%d", f->i);
21      break;
22    case SIX_LONG:
23      printf("%ld", f->l);
24      break;
25    case SIX_FLOAT:
26      printf("%g", f->f);
27      break;
28    case SIX_DOUBLE:
29      printf("%g", f->d);
30      break;
31    default:
32    }
33    putc('>', stdout);
34  }
35  putc(pre_and_postfix[1], stdout);
36  putc(' ', stdout);
37
38  if (long_option) {
39    if (f->description) {
40      printf("\n\t\t%s\n", f->description);
41    }
42    putc('\n', stdout);
43  }
44}

After every two options there is a newline inserted to make the output more readable.

C
 1static SixFlag HELP_FLAG = {
 2    .name = "help",
 3    .short_name = 'h',
 4    .description = "help page and usage",
 5    .type = SIX_BOOL,
 6};
 7
 8// part of -h, --help, +h, +help and any unknown option
 9static void usage(const char *pname, const Six *h) {
10  // should i put this to stdout or stderr
11  printf("usage %s: ", pname);
12  size_t len = strlen(pname) + 7;
13  for (size_t i = 0; i < h->flag_count; i++) {
14    print_flag(&h->flags[i], false);
15    if ((i + 1) % 2 == 0 && i + 1 < h->flag_count) {
16      printf("\n%*.s ", (int)len, "");
17    }
18  }
19
20  printf("\n%*.s ", (int)len, "");
21  print_flag(&HELP_FLAG, false);
22
23  if (h->name_for_rest_arguments) {
24    puts(h->name_for_rest_arguments);
25  } else {
26    puts("");
27  }
28}

Examples

To generate two examples (one with long names and one with short names), the default values are used:

TEXT
1Examples:
2        ./examples/dice.out +n 2 +m 6 \
3                            +l "=> " +v
4
5        ./examples/dice.out +rolls 2 +sides 6 \
6                            +label "=> " +verbose

As with the usage, after every two options there is a newline inserted.

C
 1static void help(const char *pname, const Six *h) {
 2  size_t len = strlen(pname);
 3  // [...]
 4
 5  printf("Examples: ");
 6  for (size_t i = 0; i < 2; i++) {
 7    printf("\n\t%s ", pname);
 8    for (size_t j = 0; j < h->flag_count; j++) {
 9      SixFlag *s = &h->flags[j];
10      if (i) {
11        printf("%c%s", SIX_OPTION_PREFIX, s->name);
12      } else {
13        printf("%c%c", SIX_OPTION_PREFIX, s->short_name);
14      }
15      switch (s->type) {
16      case SIX_STR:
17        printf(" \"%s\"", s->s);
18        break;
19      case SIX_CHAR:
20        printf(" %c", s->c);
21        break;
22      case SIX_INT:
23        printf(" %d", s->i);
24        break;
25      case SIX_LONG:
26        printf(" %zu", s->l);
27        break;
28      case SIX_FLOAT:
29      case SIX_DOUBLE:
30        printf(" %g", s->f);
31        break;
32      case SIX_BOOL:
33      default:
34        break;
35      }
36      putc(' ', stdout);
37      if ((j + 1) % 2 == 0 && j + 1 < h->flag_count) {
38        printf("\\\n\t %*.s", (int)len, "");
39      }
40    }
41    puts("");
42  }
43}

Help Page

The help page merges the usage, the extended option display (with description) and the example sections:

TEXT
 1$ ./examples/dice.out +help
 2usage ./examples/dice.out: [ +n / +rolls <int=2>] [ +m / +sides <int=6>]
 3                           [ +l / +label <string=`=> `>] [ +v / +verbose]
 4                           [ +h / +help]
 5
 6Option:
 7          +n / +rolls <int=2>
 8                times to roll
 9
10          +m / +sides <int=6>
11                sides the dice has
12
13          +l / +label <string=`=> `>
14                prefix for the dice roll result
15
16          +v / +verbose
17                print all rolls, not only the result
18
19          +h / +help
20                help page and usage
21
22Examples:
23        ./examples/dice.out +n 2 +m 6 \
24                            +l "=> " +v
25
26        ./examples/dice.out +rolls 2 +sides 6 \
27                            +label "=> " +verbose

With usage, options and examples:

C
 1static void help(const char *pname, const Six *h) {
 2  usage(pname, h);
 3  size_t len = strlen(pname);
 4  printf("\nOption:\n");
 5  for (size_t j = 0; j < h->flag_count; j++) {
 6    print_flag(&h->flags[j], true);
 7  }
 8  print_flag(&HELP_FLAG, true);
 9
10  printf("Examples: ");
11  for (size_t i = 0; i < 2; i++) {
12    printf("\n\t%s ", pname);
13    for (size_t j = 0; j < h->flag_count; j++) {
14      SixFlag *s = &h->flags[j];
15      if (i) {
16        printf("%c%s", SIX_OPTION_PREFIX, s->name);
17      } else {
18        printf("%c%c", SIX_OPTION_PREFIX, s->short_name);
19      }
20      switch (s->type) {
21      case SIX_STR:
22        printf(" \"%s\"", s->s);
23        break;
24      case SIX_CHAR:
25        printf(" %c", s->c);
26        break;
27      case SIX_INT:
28        printf(" %d", s->i);
29        break;
30      case SIX_LONG:
31        printf(" %zu", s->l);
32        break;
33      case SIX_FLOAT:
34      case SIX_DOUBLE:
35        printf(" %g", s->f);
36        break;
37      case SIX_BOOL:
38      default:
39        break;
40      }
41      putc(' ', stdout);
42      if ((j + 1) % 2 == 0 && j + 1 < h->flag_count) {
43        printf("\\\n\t %*.s", (int)len, "");
44      }
45    }
46    puts("");
47  }
48}

Detecting Short Flags

Since there are less than 256 ascii values that are valid for a short option, specifically the character omitting the prefix, I use a table lookup for checking both if there is a flag registered for that character and at what location the option is in Six.flags.

C
 1short table_short[256] = {0};
 2
 3// registering all options
 4for (size_t i = 0; i < six->flag_count; i++) {
 5    SixFlag *f = &six->flags[i];
 6
 7    // [...]
 8    if (f->short_name) {
 9      table_short[(int)f->short_name] = i + 1;
10    }
11}

Zeroing the array serves the purpose of treating all characters that don’t have an associated option to resolve to 0, making for a pretty good error handling. However, this requires incrementing all indices by one to differentiate from the zero value (I could’ve abstracted this via a custom Option struct or something, but I couldn’t be bothered).

For our +p example we have the index 1(0) into the option array at the table index 112, since we increment the index by one, as explained above.

Detecting short options is a thing of indexing a character into an array, so we do exactly that while processing the arguments:

C
 1for (size_t i = 1; i < argc; i++) {
 2    SixStr arg_cur = (SixStr){.p = (argv[i]), .len = strnlen(argv[i], 256)};
 3
 4    // not starting with PREFIX means: no option, thus rest
 5    if (arg_cur.p[0] != SIX_OPTION_PREFIX) {
 6      if (six->rest_count + 1 >= SIX_MAX_REST) {
 7        fprintf(stderr, "Not enough space left for more rest arguments\n");
 8        goto err;
 9      }
10      six->rest[six->rest_count++] = argv[i];
11      continue;
12    }
13
14    // check if short option
15    if (arg_cur.len == 2) {
16      int cc = arg_cur.p[1];
17      if (cc > 256 || cc < 0) {
18        fprintf(stderr, "Unknown short option '%c'\n", arg_cur.p[1]);
19        goto err;
20      }
21
22      // single char option usage/help page
23      if (cc == 'h') {
24        help(argv[0], six);
25        exit(EXIT_SUCCESS);
26      }
27
28      // check if short option is a registered one
29      short option_idx = table_short[(short)arg_cur.p[1]];
30      if (!option_idx) {
31        fprintf(stderr, "Unknown short option '%c'\n", arg_cur.p[1]);
32        goto err;
33      }
34
35      // we decrement option_idx, since we zero the lookup table, thus an
36      // empty value is 0 and the index of the first option is 1, we correct
37      // this here
38      option_idx--;
39
40      int offset = process_argument(&six->flags[option_idx], i, argc, argv);
41      if (offset == -1) {
42        goto err;
43      }
44      i += offset;
45    } else {
46        // [...]
47    }
48}

The tricky parts have comments, the rest should be obvious:

  1. check if the current argv member is a short option
    • if h: print the help page, end
  2. check if option is in the registered table
  3. process the argument for said option
  4. modify index with offset returned from parsing arguments (because an option argument can span multiple process arguments)

Detecting Long Flags

Long flags is a whole other story since we can’t match on a single character, we cant hardcode a switch or an if since that would defeat the dynamic nature of 6cl - the solution:

Hashing

If we hash all flags at register time, use that hash to index into a table, store a pointer to the corresponding option at said hash/index, hash the flags we encounter while parsing the command arguments and use this hash to index into the table - we have implemented a lookup table that allows us to keep things dynamic without the consumer having to do any work with strncmp or macro code gen.

Hash Algorithm

Good old fnv1a. I used it in HashMap in 25 lines of C and I use it in purple garden for interning strings and identifiers - its fast, has a good distribution and is easy to implement:

C
 1static size_t fnv1a(const char *str, size_t len) {
 2#define FNV_OFFSET_BASIS 0x811c9dc5
 3#define FNV_PRIME 0x01000193
 4
 5  size_t hash = FNV_OFFSET_BASIS;
 6  for (size_t i = 0; i < len; i++) {
 7    hash ^= str[i];
 8    hash *= FNV_PRIME;
 9  }
10
11  return hash;
12}

Registering Options

As with the short options, we must first register the long options by their name, or rather, by their hash:

C
 1// maps a strings hash to its index into the option array
 2short hash_table_long[__HASH_TABLE_SIZE] = {0};
 3
 4for (size_t i = 0; i < six->flag_count; i++) {
 5    SixFlag *f = &six->flags[i];
 6
 7    // we increment the index by one here, since we use all tables and arrays
 8    // zero indexed, distinguishing between a not found and the option at index
 9    // 0 is therefore clear
10    hash_table_long[fnv1a(f->name, strnlen(f->name, 256)) & __HASH_TABLE_MASK] = i + 1;
11
12    // [...]
13}

The & __HASH_TABLE_MASK makes sure we truncate our hashes to the table size:

C
1#define __HASH_TABLE_SIZE 512
2#define __HASH_TABLE_MASK (__HASH_TABLE_SIZE - 1)

We use this to now compute the hash for each long option we encounter and check if the table contains an option index:

Detecting Options by Their Names

As introduced before, we enter this path if the current argument isn’t two characters: <PREFIX><char>, but longer:

  1. Modify the string window to skip the char prefix:
    • from: +string (start=0,length=7)
    • to: string (start=1,length=6)
  2. Check if the window matches help, print help and exit if so
  3. Compute the hash for the window
  4. Check if hash is in registered option table
  5. Process arguments
C
 1for (size_t i = 1; i < argc; i++) {
 2    SixStr arg_cur = (SixStr){.p = (argv[i]), .len = strnlen(argv[i], 256)};
 3
 4    // check if short option
 5    if (arg_cur.len == 2) {
 6        // [..]
 7    } else {
 8        // strip first char by moving the start of the window one to the right
 9        arg_cur.p++;
10        arg_cur.len--;
11
12        // long help page with option description and stuff
13        if (strncmp(arg_cur.p, help_str.p, help_str.len) == 0) {
14            help(argv[0], six);
15            exit(EXIT_SUCCESS);
16        }
17
18        size_t idx = hash_table_long[fnv1a(arg_cur.p, arg_cur.len) & __HASH_TABLE_MASK];
19        if (!idx) {
20            fprintf(stderr, "Unknown option '%*s'\n", (int)arg_cur.len, arg_cur.p);
21            goto err;
22        }
23
24        // decrement idx since we use 0 as the no option value
25        idx--;
26
27        SixFlag *f = &six->flags[idx];
28        int offset = process_argument(f, i, argc, argv);
29        if (offset == -1) {
30            goto err;
31        }
32        i += offset;
33    }
34}

Handling Option Arguments

Handling arguments is fairly easy, its just a big switch, a lot of parsing values from strings to other things and validating the results of said parsing:

C
  1static int process_argument(SixFlag *f, size_t cur, size_t argc, char **argv) {
  2  size_t offset = 1;
  3  switch (f->type) {
  4  case SIX_STR: {
  5    if (cur + 1 >= argc) {
  6      fprintf(stderr, "No STRING value for option '%s'\n", f->name);
  7      return -1;
  8    }
  9    f->s = argv[cur + 1];
 10    break;
 11  }
 12  case SIX_BOOL:
 13    f->b = true;
 14    offset = 0;
 15    break;
 16  case SIX_CHAR:
 17    if (cur + 1 >= argc) {
 18      fprintf(stderr, "No char value found for option '%s/%c'\n", f->name,
 19              f->short_name);
 20      return -1;
 21    } else if (argv[cur + 1][0] == '\0') {
 22      fprintf(stderr, "No char found for option '%s/%c', empty argument\n",
 23              f->name, f->short_name);
 24      return -1;
 25    } else if (argv[cur + 1][1] != '\0') {
 26      fprintf(stderr,
 27              "'%s/%c' value has too many characters, want one for type CHAR\n",
 28              f->name, f->short_name);
 29      return -1;
 30    }
 31    f->c = argv[cur + 1][0];
 32    break;
 33  case SIX_INT: {
 34    if (cur + 1 >= argc) {
 35      fprintf(stderr, "No INT value for option '%s/%c'\n", f->name,
 36              f->short_name);
 37      return -1;
 38    }
 39    char *tmp = argv[cur + 1];
 40    char *endptr = NULL;
 41    int errno = 0;
 42    long val = strtol(tmp, &endptr, 10);
 43
 44    if (endptr == tmp || *endptr != '\0') {
 45      fprintf(stderr, "Invalid integer for option '%s/%c': '%s'\n", f->name,
 46              f->short_name, tmp);
 47      return -1;
 48    }
 49
 50    if (val < INT_MIN || val > INT_MAX) {
 51      fprintf(stderr, "Integer out of range for option '%s/%c': %ld\n", f->name,
 52              f->short_name, val);
 53      return -1;
 54    }
 55
 56    f->i = (int)val;
 57    break;
 58  }
 59  case SIX_LONG: {
 60    if (cur + 1 >= argc) {
 61      fprintf(stderr, "No LONG value for option '%s/%c'\n", f->name,
 62              f->short_name);
 63      return -1;
 64    }
 65    char *tmp = argv[cur + 1];
 66    char *endptr = NULL;
 67    int errno = 0;
 68    long val = strtol(tmp, &endptr, 10);
 69
 70    if (endptr == tmp || *endptr != '\0') {
 71      fprintf(stderr, "Invalid LONG integer for option '%s/%c': '%s'\n",
 72              f->name, f->short_name, tmp);
 73      return -1;
 74    }
 75
 76    if (val < LONG_MIN || val > LONG_MAX) {
 77      fprintf(stderr, "LONG integer out of range for option '%s/%c': %ld\n",
 78              f->name, f->short_name, val);
 79      return -1;
 80    }
 81
 82    f->l = val;
 83    break;
 84  }
 85  case SIX_FLOAT: {
 86    if (cur + 1 >= argc) {
 87      fprintf(stderr, "No FLOAT value for option '%s/%c'\n", f->name,
 88              f->short_name);
 89      return -1;
 90    }
 91    char *tmp = argv[cur + 1];
 92    char *endptr = NULL;
 93    int errno = 0;
 94    float val = strtof(tmp, &endptr);
 95
 96    if (endptr == tmp || *endptr != '\0') {
 97      fprintf(stderr, "Invalid FLOAT for option '%s/%c': '%s'\n", f->name,
 98              f->short_name, tmp);
 99      return -1;
100    }
101
102    if (val < FLT_MIN || val > FLT_MAX) {
103      fprintf(stderr, "FLOAT out of range for option '%s/%c': %ld\n", f->name,
104              f->short_name, val);
105      return -1;
106    }
107
108    f->f = val;
109    break;
110  }
111  case SIX_DOUBLE: {
112    if (cur + 1 >= argc) {
113      fprintf(stderr, "No DOUBLE value for option '%s/%c'\n", f->name,
114              f->short_name);
115      return -1;
116    }
117    char *tmp = argv[cur + 1];
118    char *endptr = NULL;
119    int errno = 0;
120    double val = strtod(tmp, &endptr);
121
122    if (endptr == tmp || *endptr != '\0') {
123      fprintf(stderr, "Invalid DOUBLE for option '%s/%c': '%s'\n", f->name,
124              f->short_name, tmp);
125      return -1;
126    }
127
128    if (val < FLT_MIN || val > FLT_MAX) {
129      fprintf(stderr, "DOUBLE out of range for option '%s/%c': %ld\n", f->name,
130              f->short_name, val);
131      return -1;
132    }
133
134    f->d = val;
135    break;
136  }
137  default:
138    fprintf(stderr, "Unknown type for option '%s/%c'\n", f->name,
139            f->short_name);
140    return -1;
141  }
142
143  return offset;
144}

By default the returned offset is one, since we handle one argument per option. The exception being SixFlag::type=SIX_BOOL, because I decided i don’t allow arguments for boolean options.

Porting Purple Garden from Getopt to 6cl

Since I wrote this library to solve my issues with getopt, I introduced it with and for that purpose and I used the interpreter as an example - I ought to show you how I used 6cl to fix these issues:

DIFF
  1diff --git a/Makefile b/Makefile
  2index 5b75a9c..3c333fc 100644
  3--- a/Makefile
  4+++ b/Makefile
  5@@ -44,14 +44,14 @@ run:
  6 
  7 verbose:
  8 	$(CC) -g3 $(FLAGS) $(RELEASE_FLAGS) $(FILES) ./main.c -o purple_garden_verbose
  9-	./purple_garden_verbose -V $(PG)
 10+	./purple_garden_verbose +V $(PG)
 11 
 12 release:
 13 	$(CC) -g3 $(FLAGS) $(RELEASE_FLAGS) -DCOMMIT='"$(COMMIT)"' -DCOMMIT_MSG='"$(COMMIT_MSG)"' $(FILES) ./main.c -o purple_garden
 14 
 15 bench:
 16 	$(CC) $(FLAGS) $(RELEASE_FLAGS) -DCOMMIT='"BENCH"' $(FILES) ./main.c -o bench
 17-	./bench -V $(PG)
 18+	./bench +V $(PG)
 19 
 20 test:
 21 	$(CC) $(FLAGS) -g3 -fsanitize=address,undefined -DDEBUG=0 $(TEST_FILES) $(FILES) -o ./tests/test
 22diff --git a/main.c b/main.c
 23index b372404..dedb6c2 100644
 24--- a/main.c
 25+++ b/main.c
 26@@ -1,9 +1,10 @@
 27-#include <getopt.h>
 28+// TODO: split this up into a DEBUG and a performance entry point
 29 #include <stdio.h>
 30 #include <stdlib.h>
 31 #include <sys/mman.h>
 32 #include <sys/time.h>
 33 
 34+#include "6cl/6cl.h"
 35 #include "cc.h"
 36 #include "common.h"
 37 #include "io.h"
 38@@ -36,158 +37,99 @@
 39   } while (0)
 40 
 41 typedef struct {
 42-  // options - int because getopt has no bool support
 43-
 44-  // use block allocator instead of garbage collection
 45   size_t block_allocator;
 46-  // compile all functions to machine code
 47-  int aot_functions;
 48-  // readable bytecode representation with labels, globals and comments
 49-  int disassemble;
 50-  // display the memory usage of parsing, compilation and the virtual machine
 51-  int memory_usage;
 52-
 53-  // executes the argument as if an input file was given
 54+  bool aot_functions;
 55+  bool disassemble;
 56+  bool memory_usage;
 57   char *run;
 58-
 59-  // verbose logging
 60-  int verbose;
 61-
 62-  // options in which we exit after toggle
 63+  bool verbose;
 64   int version;
 65-  int help;
 66-
 67-  // entry point - last argument thats not an option
 68   char *filename;
 69 } Args;
 70 
 71-typedef struct {
 72-  const char *name_long;
 73-  const char name_short;
 74-  const char *description;
 75-  const char *arg_name;
 76-} cli_option;
 77-
 78-// WARN: DO NOT REORDER THIS - will result in option handling issues
 79-static const cli_option options[] = {
 80-    {"version", 'v', "display version information", ""},
 81-    {"help", 'h', "extended usage information", ""},
 82-    {"disassemble", 'd',
 83-     "readable bytecode representation with labels, globals and comments", ""},
 84-    {"block-allocator", 'b',
 85-     "use block allocator with size instead of garbage collection",
 86-     "<size in Kb>"},
 87-    {"aot-functions", 'a', "compile all functions to machine code", ""},
 88-    {"memory-usage", 'm',
 89-     "display the memory usage of parsing, compilation and the virtual "
 90-     "machine",
 91-     ""},
 92-    {"verbose", 'V', "verbose logging", ""},
 93-    {"run", 'r', "executes the argument as if an input file was given",
 94-     "<input>"},
 95-};
 96-
 97-void usage() {
 98-  Str prefix = STRING("usage: purple_garden");
 99-  printf("%.*s ", (int)prefix.len, prefix.p);
100-  size_t len = sizeof(options) / sizeof(cli_option);
101-  for (size_t i = 0; i < len; i++) {
102-    const char *equal_or_not = options[i].arg_name[0] == 0 ? "" : "=";
103-    const char *name_or_not =
104-        options[i].arg_name[0] == 0 ? "" : options[i].arg_name;
105-    printf("[-%c%s | --%s%s%s] ", options[i].name_short, name_or_not,
106-           options[i].name_long, equal_or_not, name_or_not);
107-    if ((i + 1) % 2 == 0 && i + 1 < len) {
108-      printf("\n%*.s ", (int)prefix.len, "");
109-    }
110-  }
111-  printf("<file.garden>\n");
112-}
113-
114-// TODO: replace this shit with `6cl` - the purple garden and 6wm arguments
115-// parser
116 Args Args_parse(int argc, char **argv) {
117-  Args a = (Args){0};
118-  // MUST be in sync with options, otherwise this will not work as intended
119-  struct option long_options[] = {
120-      {options[0].name_long, no_argument, &a.version, 1},
121-      {options[1].name_long, no_argument, &a.help, 1},
122-      {options[2].name_long, no_argument, &a.disassemble, 1},
123-      {options[3].name_long, required_argument, 0, 'b'},
124-      {options[4].name_long, no_argument, &a.aot_functions, 1},
125-      {options[5].name_long, no_argument, &a.memory_usage, 1},
126-      {options[6].name_long, no_argument, &a.verbose, 1},
127-      {options[7].name_long, required_argument, 0, 'r'},
128-      {0, 0, 0, 0},
129+  enum {
130+    __VERSION,
131+    __DISASSEMBLE,
132+    __BLOCK_ALLOC,
133+    __AOT,
134+    __MEMORY_USAGE,
135+    __VERBOSE,
136+    __RUN,
137   };
138 
139-  int opt;
140-  while ((opt = getopt_long(argc, argv, "vhdb:amVr:", long_options, NULL)) !=
141-         -1) {
142-    switch (opt) {
143-    case 'v':
144-      a.version = 1;
145-      break;
146-    case 'V':
147-      a.verbose = 1;
148-      break;
149-    case 'h':
150-      a.help = 1;
151-      break;
152-    case 'd':
153-      a.disassemble = 1;
154-      break;
155-    case 'r':
156-      a.run = optarg;
157-      break;
158-    case 'b':
159-      char *endptr;
160-      size_t block_size = strtol(optarg, &endptr, 10);
161-      ASSERT(endptr != optarg, "args: Failed to parse number from: %s", optarg);
162-      a.block_allocator = block_size;
163-      break;
164-    case 'a':
165-      a.aot_functions = 1;
166-      break;
167-    case 'm':
168-      a.memory_usage = 1;
169-      break;
170-    case 0:
171-      break;
172-    default:
173-      usage();
174-      exit(EXIT_FAILURE);
175-    }
176-  }
177-
178-  if (optind < argc) {
179-    a.filename = argv[optind];
180+  SixFlag options[] = {
181+      [__VERSION] = {.name = "version",
182+                     .type = SIX_BOOL,
183+                     .b = false,
184+                     .short_name = 'v',
185+                     .description = "display version information"},
186+      [__DISASSEMBLE] =
187+          {.name = "disassemble",
188+           .short_name = 'd',
189+           .type = SIX_BOOL,
190+           .b = false,
191+           .description =
192+               "readable bytecode representation with labels, globals "
193+               "and comments"},
194+      [__BLOCK_ALLOC] =
195+          {.name = "block-allocator",
196+           .short_name = 'b',
197+           .type = SIX_LONG,
198+           .description =
199+               "use block allocator with size instead of garbage collection"},
200+      [__AOT] = {.name = "aot-functions",
201+                 .short_name = 'a',
202+                 .b = false,
203+                 .type = SIX_BOOL,
204+                 .description = "compile all functions to machine code"},
205+      [__MEMORY_USAGE] = {.name = "memory-usage",
206+                          .short_name = 'm',
207+                          .b = false,
208+                          .type = SIX_BOOL,
209+                          .description = "display the memory usage of parsing, "
210+                                         "compilation and the virtual "
211+                                         "machine"},
212+      [__VERBOSE] = {.name = "verbose",
213+                     .short_name = 'V',
214+                     .b = false,
215+                     .type = SIX_BOOL,
216+                     .description = "verbose logging"},
217+      [__RUN] = {.name = "run",
218+                 .short_name = 'r',
219+                 .s = "",
220+                 .type = SIX_STR,
221+                 .description =
222+                     "executes the argument as if an input file was given"},
223+  };
224+  Args a = (Args){0};
225+  Six s = {
226+      .flags = options,
227+      .flag_count = sizeof(options) / sizeof(options[0]),
228+      .name_for_rest_arguments = "<file.garden>",
229+  };
230+  SixParse(&s, argc, argv);
231+  if (s.rest_count) {
232+    a.filename = s.rest[0];
233   }
234+  a.block_allocator = s.flags[__BLOCK_ALLOC].l;
235+  a.aot_functions = s.flags[__AOT].b;
236+  a.disassemble = s.flags[__DISASSEMBLE].b;
237+  a.memory_usage = s.flags[__MEMORY_USAGE].b;
238+  a.run = s.flags[__RUN].s;
239+  a.verbose = s.flags[__VERBOSE].b;
240+  a.version = s.flags[__VERSION].b;
241 
242   // command handling
243-  if (UNLIKELY(a.version)) {
244+  if (a.version) {
245     printf("purple_garden: %s-%s-%s\n", CTX, VERSION, COMMIT);
246     if (UNLIKELY(a.verbose)) {
247       puts(COMMIT_MSG);
248     }
249     exit(EXIT_SUCCESS);
250-  } else if (UNLIKELY(a.help)) {
251-    usage();
252-    size_t len = sizeof(options) / sizeof(cli_option);
253-    printf("\nOptions:\n");
254-    for (size_t i = 0; i < len; i++) {
255-      const char *equal_or_not = options[i].arg_name[0] == 0 ? "" : "=";
256-      const char *name_or_not =
257-          options[i].arg_name[0] == 0 ? "" : options[i].arg_name;
258-      printf("\t-%c%s%s, --%s%s%s\n\t\t%s\n\n", options[i].name_short,
259-             equal_or_not, name_or_not, options[i].name_long, equal_or_not,
260-             name_or_not, options[i].description);
261-    }
262-    exit(EXIT_SUCCESS);
263   }
264 
265-  if (UNLIKELY(a.filename == NULL && a.run == NULL)) {
266-    usage();
267+  if (a.filename == NULL && (a.run == NULL || a.run[0] == 0)) {
268     fprintf(stderr, "error: Missing a file? try `-h/--help`\n");
269     exit(EXIT_FAILURE);
270   };
271@@ -198,13 +140,14 @@ Args Args_parse(int argc, char **argv) {
272 int main(int argc, char **argv) {
273   struct timeval start_time, end_time;
274   Args a = Args_parse(argc, argv);
275+
276   if (UNLIKELY(a.verbose)) {
277     gettimeofday(&start_time, NULL);
278   }
279   VERBOSE_PUTS("main::Args_parse: Parsed arguments");
280 
281   Str input;
282-  if (a.run != NULL) {
283+  if (a.run != NULL && a.run[0] != 0) {
284     input = (Str){.p = a.run, .len = strlen(a.run)};
285   } else {
286     input = IO_read_file_to_string(a.filename);

I am going to keep it real at this point, this doesn’t feel as ergonomic as I hoped. I still have to define an enum to make the option array order agnostic, I have to define fields on a struct and i have to fill these fields on my own.

A better way would be a macro to generate the struct and the implementation from a single source, but I’ll keep it like this for now, the rest works great and feels even better to maintain, especially the nice help page:

TEXT
 1$ ./purple_garden +h
 2usage ./purple_garden: [ +v / +version] [ +d / +disassemble]
 3                       [ +b / +block-allocator <long=0>] [ +a / +aot-functions]
 4                       [ +m / +memory-usage] [ +V / +verbose]
 5                       [ +r / +run <string=``>]
 6                       [ +h / +help] <file.garden>
 7
 8Option:
 9          +v / +version
10                display version information
11
12          +d / +disassemble
13                readable bytecode representation with labels, globals and comments
14
15          +b / +block-allocator <long=0>
16                use block allocator with size instead of garbage collection
17
18          +a / +aot-functions
19                compile all functions to machine code
20
21          +m / +memory-usage
22                display the memory usage of parsing, compilation and the virtual machine
23
24          +V / +verbose
25                verbose logging
26
27          +r / +run <string=``>
28                executes the argument as if an input file was given
29
30          +h / +help
31                help page and usage
32
33Examples:
34        ./purple_garden +v +d \
35                        +b 0 +a \
36                        +m +V \
37                        +r ""
38
39        ./purple_garden +version +disassemble \
40                        +block-allocator 0 +aot-functions \
41                        +memory-usage +verbose \
42                        +run ""

Extra: Ultra complicated error handling

You probably noticed a lot of goto err; for all non happy path endings.

C
1err:
2  usage(argv[0], six);
3  exit(EXIT_FAILURE);
4  return;

Since we do not do any heap allocations we don’t have to clean anything up, the return is just for good measure.


  1. purple-garden is a WIP of a lisp like high performance scripting language ↩︎

  2. named after the 6 looking like the G of garden. Combined with cl, which is short for command line ↩︎

  3. a planned fork of dwm to replace the config.h driven configuration with a purple garden script ↩︎