Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Arguments Package Design / Redesign

This page provides for a collection of thoughts on an extended Arguments package for Tango, and should also serve as a specification for said package. The previous ArgParser? package, while quite useful, was limited in scope. Because of the standardized nature of conventional command line argument parsing, additional features can be provided which would fit in with a standard library.

Concerns

  • Creating the need for too complex set-up for simplistic usage. The end product should still support the same or similar usage as the previous ArgParser?, without increasing (hopefully even decreasing) usage complexity.
  • ...

Goals

  • To provide a package which matches with conventional parsing and usage of command line arguments.
  • Providing an argument parsing module that requires minimal use on the part of the user to set up and use in a simplistic fashion. More complex needs can require more complex usage, but the package should still be able to fulfill a minimalist use case without requiring extensive pre-requisite setup.
  • Provide a storage area for parsed arguments, and allow the use of D-style array access and 'in' syntax.
  • Be able to provide argument validation delegates in a flexible manner.
  • Be able to support implicit arguments. (Ala 'myprog file1 file2' instead of 'myprog --files file1 file2').
  • Provide access method to command call name.
  • (tentative) Provide support for help-text generation.

Solution / Design

  • A preliminary reference design is included in bug #748.

References

Issues

  • Current reference design has confusing or possibly inconsistent handling of implicit arguments. The discourse on this issue is covered in the discussion for bug #748.
  • Reference design has multiple and possibly redundant validation mechanisms. One can apply simple validations using booleans to specify if an argument needs to be specified, or if an argument requires a parameter. However, one can also specify a validation delegate for an argument. Perhaps better to provide simple delegates for the same validations and make use of a common validation adding mechanism.
  • Currently there is no way not to make use of Arguments as a storage device for parsed arguments. Perhaps there should be a way to allow the user to store arguments externally, and just make use of the parsing mechanisms? (As the previous ArgParser? was).

Sample Usage

Assume cmdlArgs is the array as passed to main() for all usage examples.

Basic simplistic usage: Given cmdlArgs "myProg -y -z=1 --other:7 --another six"

auto myArgs = new Arguments(cmdlArgs);
assert("x" in myArgs);
assert("z" in myArgs);
assert("other" in myArgs);
assert("another" in myArgs);
assert(myArgs["y"] is null);
assert(myArgs["z"] == "1");
assert(myArgs["other"] == "7");
assert(myArgs["another"] == "six");

Usage with validated arguments:

Usage with aliased arguments:

Usage with implicit arguments:

Usage with pre-determined arguments:

Discussion thread

Comments
Author Message

Posted: 01/18/08 19:35:22

Case 1)

DSSS:

dsss can do lots of things, but among them you have

dsss install

and

dsss net install something

This shows that install here means rather different things depending on its position in the commandline (one could check if net was set, but then dsss install net would be allowed which is quite different.) In effect I think this shows that ordinals should be supported, not only if set, but the position it had in the command line.

Case 2)

svn has many subcommands/options which again can take arguments. Depending on the command, the same arguments may apply to several of them, leading to a situation similar to dsss.

Both cases can be implemented using ArgParser? (also without ordinals because the delegates can be used to implement those in the app, they're just a useful service). In general I believe this makes the case for the possibility to register delegates that are called as an argument is parsed, not after the fact - or - the order of the arguments is kept internally to make it possible to execute delegates in correct order. I don't think ordinals could be used in if's in a non-kludgy manner.

Posted: 02/13/08 18:49:08

There should be a way to specify the list of valid arguments. For example, if I do

myProg --foo

and foo is not an anticipated argument, the application currently silently ignores this (which could be confusing to the user). In fact, with the current implementation, there is no way to process unknown arguments that I can tell. I have to know the argument key first.

Along those lines, there should be a way to limit the number of arguments that are associated with a switch. For example, I may want the following to be equivalent:

myProg -o file blahblah.txt myProg blahblah.txt -o file

which would currently return blahblah.txt as a parameter to -o in the first case, and the second case would need an implicit argument assignment. If I could tell the argument parser that -o only should parse one parameter, and the rest should go to the implicit arguments (until another switch argument occurred), that would be fine.

Finally, the fact that a standalone dash isn't accepted can be troublesome for porting. Many applications use a single dash argument to either signify that the input/output should be stdout/stdin, or to signify that the program should stop parsing arguments.

That being said, I like this version of the argument parsing much better than ArgParser.

-Steve

Posted: 02/14/08 00:03:03

Still working out the details, but just wanted to write some thoughts here as to where I'm going:

-There should be a basic mode that requires minimal setup and provides basic access to what was given on the command line.
-short and long prefixes as well as their delimiters should be configurable.
-Should be able to support short/long prefix of 'null' (as in, no prefix to denote a long/short argument).
-When defining arguments, you should only need to define basics (minParams, maxParams, required, <optional: text definition>).
-If you define any arguments, it will be assumed that you have defined all supported arguments, and if any others are encountered, an exception will be thrown.
-Defining argument validation or delegation is done separately.
-If you define arguments, usage text can be generated automatically.
-Calling .parse performs validations and delegate calls automatically without having to call those specifically.

Random idea:
-Allowing shortcircuiting automagically if arguments are defined. For example, if one defines 'files' as an argument, if the command line contains 'f', 'fi', 'fil', or 'file', it can be infered that this argument matches 'files'. In the case of ambiguous matches, an exception could be thrown.

Some sample command lines to be able to parse and handle:

tar zxvf blah.tar blah2.tar
dsss net install tango
svn co https://blah/blah/blah
svn switch --relocate https://blah https://blah2
svn switch https://blah
ls -al blah.txt
ls blah.txt -al

Posted: 02/14/08 22:27:13

First, some terminology just to make sure we keep concepts straight:

myprog -a b --cde fg

myprog = run path
a = argument (short)
b = parameter
cde = argument (long)
fg = parameter
also:
Short arguments: single-character arguments, possibly grouped together under a single short prefix.
Long arguments: The sequence of all characters following the long prefix represent a single argument.
Argument delimiter: Character used to separate arguments from parameters.

To handle the 'default ez-mode', you simply create a new class by passing the ctor an argument string:

Arguments args = new Arguments(argString);

This invokes the 'default' handling mode, which provides you with some basic 'was option X in the command line' abilities.
For example, given the command line:

myprog blah -b -c cde

You would have the following:

args.runPath = "myprog"
("b" in args) = true
args["c"] = "cde"

This allows for quick, simple argument line parsing with minimal overhead and effort.

What makes this possible, is that by default, the prefixes and delimiters are pre-set to:

args.prefixShort = ["-"]
args.prefixLong = ["--"]
args.infixDelimiter = [":", "="]

However, you can set these to anything you'd like. If you'd like "@" to be a prefix, you could do:

args.prefixShort += ["@"];

or replace them entirely:

args.prefixShort = ["#", "@"];

You can also use 'null' as a prefix, just by:

args.prefixLong = ["--", null];

This lets you do things like:

myprog net install db --verbose

Which results in:

("net" in args) = true
("install" in args) = true
("db" in args) = true
("verbose" in args) = true

One could do the same with shortPrefixes as well, to do things like:

args.prefixShort = [null];
myprog abc

Resulting in:

("a" in args) = true
("b" in args) = true
("c" in args) = true

One can also pre-define acceptable parameters:

args.define("name", minParms, maxParms, required)

Which will allow for exceptions to be thrown for various circumstances, missing required parameter, invalid parameter given, etc. Another template that includes a char[] 'description' could also be introduced, to allow for automatic help text generation.

You can also still access argument parameters via .asArray, so if you set it up:

args.shortPrefix = ["-", null];

and give:

myprog zxvf file1 file2

You get:

("z" in args) = true
("x" in args) = true
("v" in args) = true
args["f"] = "file1 file2"
args.asArray("f") = ["file1", "file2"]

This is also useful for situations when parameter order matters, for example:

args.define("net", 0, 3, true);
myprog net install db

results:

args.asArray("net") = ["install", "db"]

So that if

myprog net db install

Means something different, the parameter order was preserved for you so your program can behave appropriately.
As well, one can still define validation routines:

args.validation("name", delegate)

In addition, one can also specify a callback routine to be run whenever a given argument is found:

args.callback("name", delegate)

And of course, args.parse -> performs validations and delegate calls automagically
So, some sample command lines and how one would set up to handle them:

Command Line / Setup:

tar zxvf blah.tar blah2.tar

args.shortPrefix = [null];
args.define("f", 0, -1);

Result:

("z" in args) = true
("x" in args) = true
("v" in args) = true
args["f"] = "blah.tar blah2.tar"

Command Line / Setup:

dsss net install tango

args.longPrefix = [null];
args.define("net", 0, 2);

Result:

args.asArray("net") = ["install", "tango"]

Command Line / Setup:

svn co https://blah/blah/blah

args.longPrefix = [null];
args.define("co", 1, 1);

Result:

args["co"] = "https://blah/blah/blah"

Command Line / Setup:

svn switch --relocate https://blah https://blah2
svn switch https://blah

args.longPrefix = ["--", null];
args.define("switch", 1, 2);

Results:

args.asArray("switch") = ["https://blah", "https://blah2"]
("relocate" in args) = true

args["switch"] = "https://blah"
("relocate" in args) = false

Command Line / Setup:

ls -al blah.txt
ls blah.txt -al

args.define("a", 0, 0);
args.define("l", 0, 0);

Both examples result in:

("a" in args) = true
("l" in args) = true
args[null] = "blah.txt"

Posted: 02/21/08 14:34:33

One thing I just realized, opIn_r returns a bool, which is inconsistent with D's definition of in. The precedent is that opIn_r returns either null if the key does not exist, or a pointer to the value if it does.

Should Arguments follow this design, or should it just return bool? I can see that it would be hard to implement returning a pointer to an array, especially if the exact array isn't stored anywhere.

Posted: 03/01/08 00:08:08

A second reference design has been added to ticket #748

I believe it resolves the majority of the issues raised with the initial design.

This added design has no associated documentation written, I want to wait until seeing if there are any other objections raised to it before I get into that.

There are some 'what about..' questions that I had while rewriting the module, these can be found in the initial comment block.

This design also provides for tango-style method chains, because I'm cool that way.

Posted: 03/05/08 17:12:47

Comments on new design:

in documentation for aliases:

You can also define argument aliases.

	args.define("a").aliases(["b"]);

Given that, the following would be equivalent.

	- "-a"
	- "-b"

What would be the method to access this argument, args["a"] or args["b"] or both? That is, which direction is the alias?

On ideas:

        -Forcing arguments to lowercase. (define("x").lowercase, -X gives "x" in args)
	-User defined callback when encountering an undefined argument.
	-Standard help text generation, following output control with formatting as tango.log.
	-should .reset also reset the prefix/delimiters?

Forcing arguments to lowercase is already achievable through aliasing, I don't think it's a common enough requirement.

User defined callback, I can't see a use case for it, but maybe someone else can. I can't imagine the developer is going to do anything but throw an error.

Standard help text generation: this would be good, but it is tricky to implement. I did this in my argument parsing class that I wrote, and the tough parts are how to organize the options and how to format according to screen width.

reset also resetting prefix/delimiters does not seem to be a good use case. If you want to do that, build a new arguments object. To me, the prefix/delimiters/argument definitions is like an algorithm, and the arguments are the data provided to the algorithm.

One other idea that I just had, but it would be tough :) Lots of times, usage for a program is printed in forms such as:

myprog -a|-b [--optional]

It would be cool to just specify this string ("-a|-b [--optional]") and have the appropriate definitions added. We'd have to come up with a syntax to allow specification of all the constraints, but I think it would be really easy to specify lots of common constraints quickly, and in a readable way.

Other notes:

.parameters() doc needs another slash

Definitely needs more doc for some functions, but other than that, it looks really good. I really like how the definition is chained, it's like typing English almost :)

The one thing that may be useful that you have not included is the order the arguments appeared.

For example, in many scripts, default arguments are provided by an environment variable external to the script. However, if the script wants to force it the other way, it just adds the mutually exclusive arg to the end of the script. Let's say -a means "show output", -b means "don't show output", How can I say, "last one wins" so that:

myprog -a -b

doesn't show output

myprog -b -a

shows output?

-Steve

Posted: 03/05/08 18:00:35

schveiguy wrote:

Standard help text generation: this would be good, but it is tricky to implement. I did this in my argument parsing class that I wrote, and the tough parts are how to organize the options and how to format according to screen width.

Note that this would be required to only create the text, not to actually print it somewhere - the module should not have any IO dependencies. An app may decide to put the text in a MsgBox for instance (instead of console).

Posted: 03/06/08 06:06:15

Right, my idea is to follow the same pattern as the Tango.Log Appenders. So essentially, an Appender would receive a particular Definition and decide for itself how to print it. I could provide a default Appender for Stdout output. If a user wanted a MsgBoxAppender?, they could create their own (the same as they can now with Log).

This can be a future enhancement however, and in fact could be developed entirely decoupled from this module itself.

Posted: 03/07/08 06:55:06

please don't start loading up with appenders and such ... invoking a delegate is surely sufficient for such things?

Posted: 03/07/08 20:24:37

Wow that avatar is freaky.

A delegate would work fine, I was trying to think of something that would follow other existing Tango models for outputting text from a module that has customized generated fields, in the interest of being consistent with the rest of the lib. The logger was the first thing that came to mind, seems to be a similar sort of use-case, though, you probably aren't quite as concerned with performance here. :)

Posted: 03/25/08 04:46:21

Current iteration on ticket 748 seems to fulfill most of the hopes and dreams, imo.

There's one other item to consider, after discussing with Kris on irc, and that's whether to continue with the AA syntax or not. I personally don't have a particular attraction one way or the other so I thought I'd bring it up here and see what anyone else had to say.

-So, is there anything gained by allowing ("arg" in args) versus something like (args.contains("arg"))? Is there anything lost? Does using opIn potentially confuse the user?

-Same questions, but for opIndex. (argsarg?) versus something like (args.parameters("arg")).