big tits
big tits
al-ama
al-ana
al-asi
al-bbw
al-bds
al-blo
al-boo
al-ebo
al-foo
al-gro
al-gy
al-han
al-har
al-int
al-lat
al-les
al-mat
al-mil
al-pan
al-par
al-pis
al-shm
al-tee
al-voy
ce-ama
ce-ana
ce-asi
ce-bbw
ce-bds
ce-blo
ce-boo
ce-ebo
ce-gy
ce-har
ce-int
ce-lat
ce-les
ce-mat
ce-mil
ce-pan
ce-pis
ce-shm
ce-tee
ce-voy
vi-ama
vi-ana
vi-ani
vi-asi
vi-bbw
vi-bds
vi-blo
vi-boo
vi-ebo
vi-foo
vi-gro
vi-gy
vi-han
vi-har
vi-inc
vi-int
vi-lat
vi-les
vi-mat
vi-mil
vi-pan
vi-par
vi-pis
vi-shm
vi-tee
vi-uni
vi-voy

Start of topic | Skip to actions

Acumen Coding Style Guidelines

This page documents coding style guidelines originally developed for Acumen. It is meant to be applicable to any other free/open source projects using OCaml.

About
Licensing The Code
Formatting The Code
Expressions
Clean Use of Constructs
Commenting The Code
- General Remarks
  - Identifiers in A Comment
  - Algorithmic Code
- Function Definitions
  - Commented Out Code
Variables
Miscellany
References

About

The conventions outlined in this document are meant to:

rule out certain bad coding practices;

ensure the code is easy to read, understand, and modify, and convenient to grep over; and

reduce the variability in formatting (e.g. spacing around operators, when & when not to insert newlines) and make the outlook of the code consistent.

This guideline is definitely NOT a comprehensive or exhaustive characterization of "good code", and is NOT meant to be a substitute for thoughtful and careful programming. In particular, one should violate the conventions in some parts of the program if need be; however, every violation should be an isolated incident affecting limited portions of the codebase, and must be supported by a good reason.

Anything not noted in this guideline are left to the discretion of each programmer. Acumen is still in its early stage of development, and so is this guideline. Everyone is welcome to contest the usefulness of specific conventions and to suggest alternatives (including abolishment of that particular convention). In order to facilitate such discussions, rationales are provided for most rules.

While applicable to other projects, this document assumes at places that the software is meant to be free (as in speach) and/or open source.

Licensing The Code

EVERY source file and every non-trivial supporting materials included in the source tree, including user documentation, should begin with a notice about the license which covers the contents of that file. This license notice should

identify the copyright holders, preferably in the form "Copyright (C) year(s) name(s)" or "Copyright year(s) name(s)". It's not as important to note who the "authors" or "supervisors" are, although mentioning them doesn't hurt.

Rationale. The names of the authors or supervisors of the project are, insofar as most people from outside the project team are concerned, not very useful. OTOH, someone who has a license-related problem or question may want to know the copyright holders, because they are the ones entitled to settle legal issues with the code, not authors or supervisors. (Note that they can be different; for example, the authors of most GNU software are hundreds if not thousands of volunteers who wrote the code, but the copyright holder is the FSF.)

include or refer to the license which covers that file, and explicitly say that such license covers the file. If the license is long, it might be separated into a single file, most often named LICENSE or COPYING. In that case, refer the readers to the license file, and if the license has a well-known name (e.g. "GNU General Public License version 2.1"), give it. The license header and the license must be written in precise, unambiguous language that will be able to withstand challenges in a court of law, including attempts to give significantly different interpretation to the wordings than intended.

Rationale. Putting the license in every file allows somebody who wandered into the source file on svnweb from google to easily find out the ways in which s/he can use the code. Giving the name of the license serves as a convenient overview/summary, saving the reader the time to go see the LICENSE/COPYING file if s/he knows the license. As for the use of language, a license written in vague language is useless in an actual court case and defeats the purpose of having a license. When in doubt, consult a lawyer.

Remark. Be advised that in the free/open source world, it's most definitely a REALLY BAD IDEA to come up with your own license, unless you have a dedicated committee consisting of a band of lawyers that specialize in software licensing (even then, history shows, it's usually not a good idea).

Most often you end up with a license that's incompatible with existing free/open source licenses (e.g. QPL or APLv1 vs GPL) or otherwise deficient, which will slow down adoption. Before people from the free/open source community use your software, they will want to scrutinize your license for potential loopholes and assess various risks. That's a task that no one wants to partake, especially if your software is not already as dominant as Windows. You will also reduce the likelihood of getting outside contributions simply by choosing a "freaky" license.

Note also that "I'll just have a short and simple license" doesn't tend to work. People doing that have very often ended up with uninterprably vague licenses (e.g. LHa), ones that don't make sense outside a particular jurisdiction (e.g. qmail), or ones that have restrictions that turn out to be problematic in practice (e.g. original 4-clause BSD).

It is almost always the case that choosing one of the "tried and true" free/open source licenses listed on opensource.org or gnu.org is cost effective and trouble-free, especially if you want to see your program widely used. You can pick up license header templates to use in individual source files at gnu.org or in source files from existing open source projects.

Formatting The Code

This section discusses detailed formatting rules. If you are using Tuareg mode on emacs, it should mostly suffice just to gloss over the examples and notice the pattern in the placement of newlines---insert newlines to bring the "important" (control flow-wise) stuff to the left. Emacs will take care of the rest (to an extent).

Character Set

Avoid the use of non-US-ASCII characters in the source code when possible.

Rationale. Just to avoid unnecessary problems. Note also that some characters (such as Cyrillic) may show up in different widths in different environments, which can make it hard to estimate column width (cf. Column Width).

Column Width

Each line should contain 79 characters at maximum, not counting the newline character (ASCII decimal code 10, possibly preceded by and including carriage return = ASCII decimal code 13). Exceptions include lines with long string literals (cf. Literal Constants). So-called "wide" characters (mostly East Asian) should be counted as 2 characters. (Remark. There are no "wide" characters in US-ASCII.)

Rationale. For better or worse, this is THE convention for the limit on the width of program source code. It also seems to be a reasonable bound on the lengths of lines in a cleanly written program. If indentation gets too deep to comply with this requirement, you're usually better off trying to reorganize your code.

Indentation

Indentation should be increased or decreased 2 columns at a time.

Rationale. We need to pick some convention, and this is the one used for popular emacs major modes for OCaml, namely ocaml-mode, bundled with INRIA's OCaml package, and Tuareg-mode.

Tabs (ASCII decimal code 9) should not be used in source code. In emacs, you can conveniently prevent tabs from creeping into the file by adding

(setq indent-tabs-mode nil)

to your .emacs file.

General Program Structure

As with any programming, program structure should be expressed by indentation, not parenthesization or other machine-friendly but human-unfriendly means. "Program structure" here means, e.g. which part of the code belongs to the body of which let, which part is in the consequent of an if-then-else construct, which part denotes an argument to which function, etc. (cf. Clean Use of Constructs for detailed indentation rules.)

For example, avoid:

(* bad *)
let rec foo x y : int =
  if (x < (!y)) then ((!y) / x) else let x = 1 in y := bar ((!y)/(2*(x+(!y))));
  if (x <> 1) then bar 1 else raise Exit
;;

and instead write:

(* good *)
let rec foo x y : int =
  if (x < !y) then
    !y / x
  else
    let x = 1
    in
      y := bar !y / (2 * (x + !y));
      if (x <> 1) then
        bar 1
      else
        raise Exit
;;

Rationale. You should find it much easier to discover something suspicious in the above code when it's presented in the latter format than the former, namely, the condition under which Exit is raised makes no sense in the second branch of the first if statement. It should also be evident that one would most likely not make this mistake when writing in the latter format, whereas such a slip is easier to go unnoticed in the former format.

Expressions

Operators

All infix operators, i.e. arithmetic and boolean operators, comparisons, the assignment operator, string concatenator (^), and by abuse of terminology, the -> (in a fun expression) should be separated from their operands on both sides by exactly one space. As an exception, if the operator is the first thing on a line, there should be no extra spaces (other than the indentation) on the left. Also, if the operator is the last thing on a line, don't put a space on the right. It may sometimes be OK to not put spaces around an arithmetic operator inside a complex expression.

Thus avoid writing

(* bad *)
y  :=a +b-c * d

and instead write

(* good *)
y := a + b - c * d

A similar rule applies to the = in a let expression. cf. Variable Bindings.

Function Invocation

A function's parameters should be separated by exactly 1 space, from each other and from the function. If the parameter list is too long to fit in 1 line, break the line wherever it's convenient and continue on the next line, incrementing the dept of indentation. Thus write

(* good *)
function_name parameter1 parameter2 parameter3
  parameter4 parameter5

If the function invocation is on the right hand side of a let binding and it doesn't fit in 1 line, then first break the line before the function name (cf. Variable Bindings).

Comma and Unary Operators

The unary operator ref and type constructors should be separated from its argument by exactly 1 space, whether or not the argument is enclosed in parentheses.

Rationale. Consistency with function application.

Every comma should be followed by exactly 1 space, unless it's at the end of a line. Commas should not be preceded by spaces unless there's a compelling reason to do so. This applies to ANY comma, regardless of where it appears.

Rationale. Readability and consistency with existing conventions for other software projects and plain-text typography.

Thus avoid writing

(* bad *)
Cons(a,b)

and instead write

(* good *)
Cons (a, b)

Parenthesization

Avoid redundant parentheses. Use parentheses only when inserting them has a clear advantage in readability, or if required by OCaml's grammar. Note that in many cases, indentation and spacing are preferable tools for clarifying code than parentheses.

Thus avoid writing

(* bad *)
((sp.Lexing.pos_lnum)-1,
  (sp.Lexing.pos_cnum)-(sp.Lexing.pos_bol)-1)

and instead write

(* good *)
(sp.Lexing.pos_lnum - 1,
 sp.Lexing.pos_cnum - sp.Lexing.pos_bol - 1)

Literal Constants

This section handles large literals (in terms of number of characters in source code) such as string literals, list literals, etc.

String Literals

It is not advisable to break string literals up into multiple lines using the ^ operator, since it is easy to lose a space that way. For example,

(* bad *)
let s = "some long"
        ^ "string"

is probably not what you want: s is set to "some longstring", not "some long string". However, it is desirable to cut it down to 79 characters per line. Sometimes shortening it to 79 columns using ^ helps, sometimes it's not worth the effort. Use your discretion to decide what to do.

List/Array/Tuple Literals

Large list, array, or tuple constants should be indented to show their nesting.

Thus avoid writing

(* bad *)
((PsAst.SProg(([||], [||]), ([||], [||]), ([||], [||], [||]),
 ([||], [||]))), PsAst.SExt([||],[||],[], [||]))

and instead write

(* good *)
((PsAst.SProg (([||], [||]),
               ([||], [||]),
               ([||], [||], [||]),
               ([||], [||]))),
 PsAst.SExt ([||], [||], [], [||]))

Clean Use of Constructs

This section is primarily concerned with detailed indentation conventions for OCaml constructs.

`let`

This section discusses the formatting rules for let. let rec is subject to the same rules.

Variable Bindings

Whenever you use the let or and keyword, always make it the first thing on that line, and avoid putting multiple bindings on the same line. All and keywords should be at the same indentation depth as the corresponding let keyword. If the body of a let is again a let, then the let in the body should be indented to the same level as the enclosing let. Thus, avoid

(* bad *)
let x = foo and y = bar in
  let
  z = baz
 and w = raz

and instead write

(* good *)
let x = foo
and y = bar in
let z = baz
and w = raz

Rationale. By lining up the keywords, it becomes easier to identify the set of variables bound at that level. Simply by running your eyes down the leftmost column, you can easily tell which of the bindings are and and thus outside the scope of the preceding binding(s), and which are let that are in the scope of preceding binding(s) (in Scheme-speak, it's easy to tell apart let and let*).

The = token should be separated by exactly 1 space each on both sides. As an exception, if it's at the end of a line, only the left side of the = needs to have a space. There should be exactly 1 space between the let or and and the name of the variable it binds. In particular, do NOT align the = token with each other. A possible exception is if there is a very good reason that the bindings must be compared and contrasted.

Thus avoid writing

(* bad *)
let x  = 0
and long_name=1
and ysads=          2

and avoid writing

(* bad *)
let x         = 0
and long_name = 1
and ysads     = 2

but instead write

(* good *)
let x = 0
and long_name = 1
and ysads = 2

Rationale. If a programmer reading the code wants to find out the closest binding occurence of a variable, say x, an easy way to do it is to simply search toward the beginning of the file for " x =". If the number of spaces around the = is not standardized, then s/he would have to type " x *=" (in regular expression), which is easy to forget to do. As for aligning, it has little readability benefits, which we're not after, and is a nuisance to maintain.

If the right hand side of a binding is a function invocation that is too long to fit in 1 line, then first break the line before the function name. Thus avoid writing

(* bad *)
let x = function_name parameter1 parameter2 parameter3
  parameter4 parameter5

and instead write

(* good *)
let x =
  function_name parameter1 parameter2 parameter3
    parameter4 parameter5

Rationale. The latter format makes it easier to determine which part is the function and which ones are parameters, just by looking at the indentation.

In a function definition, use the "short-hand" notation and avoid using the fun keyword, unless you want to stress that the function is meant to be passed around as a value.

Thus avoid writing

(* bad *)
let f = (fun () -> 0)

and instead write

(* good *)
let f () = 0

The `in` Keyword

The in keyword should by default be placed in a line by itself. However, it can be placed at the end of the variable binding (i.e. on the same line as the last and or let keyword) if the last binding fits in 1 line, or if the body is another let. Thus avoid

(* bad *)
let x = foo
and y =
  bar parameter1 parameter2 parameter3 parameter4
    parameter5 parameter6 in
  body

and instead write the following.

(* good *)
let x = foo
and y =
  bar parameter1 parameter2 parameter3 parameter4
    parameter5 parameter6
in
  body

The following is OK because the last binding fits in 1 line.

(* good *)
let x = foo
and y = bar baz in

The following is also OK because the binding list is immediately followed by another let.

(* good *)
let x =
  bar parameter1 parameter2 parameter3 parameter4
    parameter5 parameter6 in
let y = bar baz

The following is NOT OK, because the second let is not at the beginning of a line.

(* bad *)
let x = foo
in let y = bar

Rationale. The in keyword helps the reader find the end of the variable binding construct. This is usually not necessary if the binding construct fits in 1 line.

Body

The body of the let should be indented 1 level (2 spaces) deeper than the variable bindings. Thus write

(* good *)
let x = foo in
  x y z

Rationale. This rule is IMPORTANT and should be strictly followed, as the indentation is crucial to identifying the scope of variables. Consistent indentation to show variable scope is about the bare minimum of formatting that a sane piece of code needs to have.

Global `let`

Global let constructs should end with a ;; (double-semicolon).

Rationale. When you accidentally introduce unbalanced parentheses or other syntax errors and try to compile the code, the parser will at least find there's something wrong at the ;;, so it won't go off all the way to the end of the file before reporting a syntax error. This can help search where the syntax error is.

For a global let, do NOT put a comment near the ;; to say which function definition it closes.

Rationale. It's not very useful since the reader can just "search" for the regexp "^let" to find the corresponding let. It's in fact harmful if you change function names, since you are bound to forget to update the name beside the ;; (unless you write a script to do it for you, which seems like overkill).

`if-then-else`

Basic Structure

Every if or else keyword should appear at the beginning of a line. An if and a corresponding else should appear at the same indentation depths. The consequent and alternate (the if branch and else branch, resp.) should be indented 1 level deeper than the if or else keywords.

Do NOT put parentheses around the test expression. If the test expression does NOT fit in 1 line, place the then on a line by itself.

Thus avoid writing

(* bad *)
if (some very long complicated test expression
      thats longer than one line) then
  consequent
  else alternate

and instead write

(* good *)
if some very long complicated test expression
     thats longer than one line
then
  consequent
else
  alternate

Rationale. The then keyword helps the reader find the end of the test expression.

Cascaded `if-then-else` (aka `else-if`)

If the alternate (the else branch) is again an if-then-else construct, then place the inner if (and the corresponding test expression) on the same line as the outer else. The inner if-then-else should otherwise be indented at the same level as the outer if-then-else. This applies no matter how many if-then-else are cascaded.

In a cascaded if-then-else, you should avoid putting the consequent or alternate on the same line as the if or else keywords. All then keywords in a cascaded if-then-else should be placed on the same line as the corresponding if, or if the test expression is too long to fit in 1 line, the then should be placed on a line by itself.

Thus avoid writing

(* bad *)
if a = b
then conseq1
else
if b > c then
  conseq2
  else
    alternate

and instead write

(* good *)
if a = b then
  conseq 1
else if b > c then
  conseq 2
else
  alternate

Rationale. In a cascaded if-then-else readers will mostly rely on if and else to determine where a branch begins and ends. It is then more beneficial to show them the same information by indentation, and then would only be a distraction. else followed by if is put in one line because else if is a phrase that is seen in many popular languages such as C or Java, and many programmers find it highly recognizable.

Do not use cascaded if-then-else if match can naturally do the job. Thus avoid writing

(* bad *)
if x = "some str" then
  foo
else if x = "another str" then
  bar
else if x = "yet another str" then
  baz
else
  boo

but instead write

(* good *)
match x with
 | "some str" -> foo
 | "another str" -> bar
 | "yet another str" -> baz
 | _ -> boo

However, avoid writing the following. Cascaded if-then-else is a better construct in this case, since there's no "matching" going on in the code.

(* bad *)
match 0 with
 | 0 when x < 0 -> foo
 | 0 when x <= 100 -> bar
 | 0 -> baz

Rationale. match can warn you if your cases aren't exhaustive, whereas if-then-else can't.

Test Expressions

When a test expression is too long, always bring the boolean operators (&& and ||) to the beginning of each line. Show the structure of the test expression by indentation. Do not rely on the precedence of boolean operators and use parenthesization and/or indentation. If applicable, write a comment on the high-level interpretation of the condition for a complex test expression.

Thus avoid writing

(* bad *)
if (0.0 <= x && x <= 1.0 || 0.0 <= y && y <= 1.0) &&
  x * x + y * y <= 1.21
then

and instead write

(* good *)
(* If x or y is in [0..1] and point (x, y) lies inside the
 * disk of radius 1.1 centered at the origin.  *)
if ((0.0 <= x && x <= 1.0) || (0.0 <= y && y <= 1.0))
  && x * x + y * y <= 1.21
then

(* good *)
(* If x and y are both in [0..1] and point (x, y) lies inside the
 * disk of radius 1.1 centered at the origin.  *)
if ((0.0 <= x && x <= 1.0)
    || (0.0 <= y && y <= 1.0))
  && x * x + y * y <= 1.21
then

Rationale. This makes it possible to understand the overall structure of the test expression by simply looking at the very left. Also, many people don't remember that && has a higher precedence than ||.

Do not use the boolean operators | and &.

Rationale. They are deprecated.

Body of An `if-then-else` Branch

If the consequent or alternate expression has a ; in it, then use begin-end to delimit it, rather than parentheses. begin and end should each occupy a line by itself, and be indented 1 level deeper than the corresponding if keyword. The body contained in the begin and end should be indented 1 level further, thus a total of 2 levels deeper than the if keyword. Thus avoid writing

(* bad *)
if !x then
  (y := foo;
   x := false)
else
  bar

but instead write

(* good *)
if !x then
  begin
    y := foo;
    x := false
  end
else
  bar

Rationale. begin-end blocks are easier to indent or edit than parenthesized code.

`while` and `for`

Format while and for loops like if-then-else, as if while and for are the if keyword, do = then, and done = else. (cf. section =if-then-else=.) There's no cascading for loops, however.

Sequencing

You should break a line after every ; token. Sequenced statements (i.e. expressions separated by ; and by nothing else) should be indented at the same level.

Thus avoid writing

(* bad *)
let _ = foo (* This is just to introduce indentation.  *)
in
  a := 3; print_int (a * b);
    c := 4

and instead write

(* good *)
let _ = foo (* This is just to introduce indentation.  *)
in
  a := 3;
  print_int (a * b);
  c := 4

Pattern Matching (`match` , `function` and `try`)

`|` Before Branches

Every branch in a match or try (with) construct should have a vertical bar (|), including the first. The | should lie 1 level deeper than the match keyword.

Thus avoid writing

(* bad *)
match x with
    foo -> bar
  | baz -> x

and instead write

(* good *)
match x with
  | foo -> bar
  | baz -> x

Rationale. The latter is more convenient for adding a new branch at the top, and always writing a | won't hurt.

Always put exactly 1 space after each |.

Rationale. Consistency of the indentation of the patterns.

Pattern in A Branch

A pattern in a branch should be formatted as a literal constant (cf. Literal Constant). If the pattern spans multiple lines, then, if there's no guard on this pattern, put the -> on a line by itself, indented to the same level as the pattern, not counting the | and the space thereafter. If the pattern has a guard, follow the format in Guards.

Thus avoid writing

(* bad *)
match dat with
  | ((PsAst.SProg(x, ([||], [||]), ([||], [||]), ([||], [||], [||]),
                  ([||], [||]))), PsAst.SExt([||],[||],[], [||])) ->
      body

and instead write

(* good *)
match dat with
  | (PsAst.SProg(x,
                 ([||], [||]),
                 ([||], [||]),
                 ([||], [||], [||]),
                 ([||], [||])),
     PsAst.SExt([||],[||],[], [||]))
    ->
      body

Guards

The when keyword should appear on the same line as the last | that it corresponds to, if the pattern(s) and the guard expression altogether fit in one line. Otherwise, a newline should be inserted between the end of the pattern and the when keyword, and the when should be indented to the same level as the pattern not including the | and the space following it. If the pattern is especially long, sometimes can be helpful to insert an empty line before the when keyword.

Thus avoid writing

(* bad *)
match dat with
  | ((PsAst.SProg(x, ([||], [||]), ([||], [||]), ([||], [||], [||]),
                  ([||], [||]))), PsAst.SExt([||],[||],[], [||])) when
      x = 0 -> body

and instead write

(* good *)
match dat with
  | (PsAst.SProg(x,
                 ([||], [||]),
                 ([||], [||]),
                 ([||], [||], [||]),
                 ([||], [||])),
     PsAst.SExt([||],[||],[], [||]))

    when x = 0 ->
      body

(or write the above "good" code without the empty line).

Rationale. With this convention, the when delimits the body (and the guard) clearly apart from the pattern.

Nested Pattern Matching

When a pattern matching is placed inside another pattern matching construct, begin-end should be used to delimit the inner pattern matching construct. The begin and the matching end keywords should be indented to the same depth. However, be advised that if the inner pattern matching construct is short, it is sometimes better to use parentheses instead.

Thus avoid writing

(* bad *)
match x with
  | foo ->
      (match y with
         | A z -> z
         | B f -> (fun t -> f (t * (t - 1))))
  | baz -> zab

but instead write

(* good *)
match x with
  | foo ->
      begin match y with
        | A z -> z
        | B f -> (fun t -> f (t * (t - 1)))
      end
  | baz -> zab

(* good *)
match x with
  | foo ->
      begin
        match y with
          | A z -> z
          | B f -> (fun t -> f (t * (t - 1)))
      end
  | baz -> zab

Rationale. begin-end blocks are easier to indent or edit than parenthesized code.

Commenting The Code

General Remarks

The usual rants you hear from people about commenting code apply.

Insert comments around every piece of code that might be confusing or hard to comprehend. In doing that, always remember to UNDERESTIMATE people's capabilities, particularly the ability to keep everything in their heads.

A comment should be inserted if and only if it adds some information that is not obvious from the code. If you can summarize a chunk of code or if you can clarify some code by a comment, then try to rewrite the code so that a comment is unnecessary (try to make the code shorter or simpler, and/or try to express information through variable names). If that attempt fails miserably, then resort to a comment.

Thus avoid writing

(* bad *)
let dx = small_num in  (* set dx to a small number *)
let dydx = f (x + dx) / dx  (* apply f to x + dx, divide by dx and
                             * let that be dydx. *)
in
  (* Set pt to a coordinate.  *)
  pt := (x + delta, y + dydx * delta);

Notice that all of the information provided in the comment is actually easier to understand from the code itself. You should instead write

(* good *)
(* Linearly extrapolate f at x + delta (approximation used).  *)
let dx = small_num in
let dydx = f (x + dx) / dx  (* Approx derivative. *)
in
  pt := (x + delta, y + dydx * delta)

Identifiers in A Comment

If a sentence begins with an identifier that is used in the source code, do NOT capitalize it unless the identifier is in fact capitalized in the code.

Rationale. Since OCaml is case-sensitive, capitalization makes it a different identifier and may interfere with grep'ing.

Algorithmic Code

Implementation of a complicated algorithm should always begin with a long comment containing a high-level description of the algorithm (maybe the comment needs to be broken up so that you have one comment for each step in the algorithm). If the algorithm is mathematical in nature, give a mathematical description. The comment should

provide at least one name the algorithm is known by;
name all (mathematical) variables used in the description;
describe the algorithm at a higher-level than the code; and
include a reference to more information.

The last one may be omitted if the algorithm is a famous one. The variable names used in the comment should be consistent with variable names in the actual code.

Rationale. Using consistent variable names makes it easier for the reader to make the correspondence between the comment and the code.

Function Definitions

The remarks in this section applies to class definitions as well.

Each non-trivial function should eventually be accompanied by a comment briefly describing what it does. Do NOT try to describe every tiny detail of the function's semantics in full.

Rationale. If the reader wants to know the details, s/he ought to read the code. It's easy for a detailed description to become subtly out of date and cultivate misunderstandings that lead to bugs. If the function's interface is expected to be very stable, such as if the function is a part of some public API, then it is important to document all aspects of the function's behavior.

You do NOT have to mention the name of the function at the top, unless the comment is too long to fit in one screen.

Rationale. normally, the user can see the function name for him/herself.

Every parameter or return value with some sort of "surprise" should be documented. For example, if the function takes an array and an index, but the index should point to the element preceding the element to be actually used, then that fact needs to be documented. However, relentlessly purge zero-entropy descriptions of parameters and return values, such as "this is the first argument to the function", or other information that can easily be inferred from the parameter's name etc.

You don't have to do "calligraphy" on these comments. You can get away with prudent decorations. See the example at the end of this section. You don't necessarily have to use complete sentences or otherwise "nice"-sounding language, but your wording should be precise and unambiguous.

Rationale. Superfluous decorations only clutter the code and make it harder to find the "meat" of the comment.

If you want to mention the type of the return value, write a type annotation first and see if that would do.

Thus avoid writing

(* bad *)
(* function foo ******************************************************
 * Parameter Descriptions ********************************************
 * @param arg1 The first argument to the function foo.
 * @param ary An array.
 * @param i This is an index.  It has to be within bounds of ary.
 * Explanation of Return Value ***************************************
 * @return The return value is a float.  If it is true, that means
 *         we branched to the second branch in the first match state-
 *         ment, and therefore the baz metric is given off as the
 *         return value.
 *********************************************************************)
let foo arg1 ary i =
  ...

and instead write

(* good *)
(* Computes the metric baz on a given array.
 * @param arg1 See definition of type foo_flags.
 * @param i Index to start computing on. *)
let foo arg1 ary i =
  ...

Notice how irrelevant decorations are removed and there's more actual content in the comment.

Commented Out Code

Don't leave them in the finished source code. When you commit, all code that you commented out during the hacking process should be removed. As an exception, sometimes it is beneficial to leave commented out code---for example, it may be illustrative of a failed approach and is useful in understanding why the current code is written the way it is (and without it, that understanding is hard to come by). In that case, the commented out code should be accompanied by a comment explaining why the code still belongs there and why it is of interest to anybody other than the person(s) who wrote it and/or commented it out.

Rationale. Avoid code clutter. Why keep useless junk in finished parts of the product if you're "done" with it? If you need to revive old code, then you should get it from the revision control system.

Remark. "Commented out code" here means code that used to not be a part of any comment but is now "disabled" by being put inside a comment. "Code" that was never meant to be compiled, such as example code or code-like notation used in a comment, is not considered commented out code. For example, the following is OK, although it contains "code" inside a comment:

(* good *)

(* These functions need to be called before and after doing foo.
 * Always call them like:
 *    bar x size;
 *    (* do foo *)
 *    baz x size
 *)
let bar x size = ...
and baz x size = ...
;;

(* We need to db#hold (rolled_back_state (lookup (x + y))) without
 * touching the database.  In order to do this, yada yada...  *)

Variables

Naming Variables

Names of variables (including functions) should be in c_style. This means that the name consists of lower-case alphabet letters and/or digits, with words separated by an underscore _. For example,

this_is_in_c_style
thisIsNotInCStyle (itIsInJavaCase similar to CamelCase)

Rationale. We should have a convention for the sake of consistency. c_style separates words more clearly than javaCase, and is therefore easier to read.

Global Variables

Global variables (including functions) should have fully descriptive name. Every non-function global variable definition should be accompanied by a comment saying what the variable is used for. cf. Function Definitions for comments to put on a global function definition.

Local Variables

A local variable that is visible across a very large block of code should be treated (named, commented, etc) as if it's a global variable.

Rationale. A local variable with a large scope range is effectively global.

Local variables with limited scope should have concise names that are descriptive enough that it's easy to remember and recognize. It's best if the name can be descriptive enough to be self-explanatory. It may be occasionally OK to use a cryptic name and add a comment at the declaration to explain what it holds.

Unused Variables

Unused variables should be named _ (just an underscore).

Thus avoid writing

(* bad *)
let (prop, foo1) = some_fun x

and instead write

(* good *)
let (prop, _) = some_fun x

If you're using let _ = with the purpose of imposing execution order on unit-returning functions, consider using sequencing (;) instead.

Thus avoid writing

(* bad *)
(* foo, bar, baz return unit *)
let _ = foo () in
let _ = bar () in
let _ = baz () in
 ret

and instead write

(* good *)
foo ();
bar ();
baz ();
ret

Miscellany

Stubs

Stub code is tentative code put in places that are to be implemented later. Stub code should yield a verbose error message that includes

file name and line number of the stub code (you can use assert false or camlp4 to get these);
whether or not the stub code was expected to be reachable; and
if possible, instructions on how to avoid executing the stub code, which the user (who has never read the code) can understand

before exiting the program (or whatever else it does). In particular, you should never, ever, call the exit() function directly. When feasible, arrange your code so that the stub is unreachable until you implement it.

Rationale. Silently exiting the program gives no feedback as to the cause of the error and makes the problem, if it ever arises, unnecessarily hard to investigate.

Principle of Grouping

Try to put functionally related (coupled) pieces of code near one another, but not if that means moving code across abstraction boundaries. Relatedness here does NOT necessarily mean similarity in names or functionality. A good test of relatedness is to ask yourself, "if I modify one of these pieces of code, is it likely that I'll be forced to modify the other one(s) too?"

For example, it may be a good idea to place a function that invokes the simulator close to a function that resets the state of the simulator to prepare for a new simulation (it may not be; it all depends on the context). However, the simulator code should be placed in a different module because that's an isolatable component.