Documentation:Styles
Regardless of the styles you will follow, please do be sure to read the general and specific suggestions regarding creating and publishing manuscripts, posters, web pages, and grant applications.
Contents |
Introduction
There was a lot of redundancy in different documents which were originally created for different purposes. A clean CompBiki style suggestion look is being accomplished where all these different suggestions are incorporated into a single document in a cohesive, consistent, and organised manner. Volunteers with an eye for detail and good writing skills were requested but no one stepped up to the plate and so Ram, your fearless leader, decided to do it on a late Friday night (currently 530a on Sat morning actually). A Documentation: virtual hierarchy was created, with a Documentation:Publishing page that integrated the previous manuscript and web publishing suggestions, and Documentation:Styles that collated the scattered styles. The remainder of the work is to properly integrate my grant application style suggestions into these two pages.
General
Do
- Do write with excellent grammar, spelling, punctuation. This should be the norm (not to mention that the latest research has shown that it is correlated with a longer lifespan, with a P-value of 0.49). Organisation and consistency particularly is key. Finally, clarity of what you present is the most important. "suggestion" is quoted because it is used in the sense that elements of a 12 step program is a suggestion (smiley face for the humour impaired). It signals our collective wisdom and if you deviate from it without good reason, it's the end of the world (insert another smiley with tongue in cheek).
- Do know difference between "you're" and "your", "it's" and "its", "there" and "their". See the external links section for help with this.
- Do go over this entire list of suggestions and use it as a checklist along with the Documentation:Publishing and perform an appropriate search (and any necessary replacing) before sending others (especially the PI) documents to peruse. It wastes time and prevents one from getting to the core issues in the document; this shouldn't happen and in general you should never place a burden on the reader to fix style issues already addressed here, even if only mentally.
Do not
- Do not use contractions (i.e., "Don't use contractions") in formal writing. It is sometimes okay on web pages.
- Do not anthropomorphise or, more specifically, attribute actions to inanimate objects: For example, "protein's structure" should be "structure of a protein" (the former implies that the structure is owned by the protein, something a protein cannot do). Also, "which" is not always appropriate when refer ring to inanimate objects (i.e., "a protein that is used for..." as opposed to "a protein which is used for...").
- Do not put apostrophes to indicate plurals; just make the word plural as you would normally: "DNAs" is good; "DNA's" is not (unless you wish to refer to the actions of DNA, in which case, see item above).
- Do not use the phrase "in order to...". "To" is adequate in almost all situations.
- Do not use the phrase "it's clear that...", especially in formal article. If it's clear, it's clear, and there's no need to say it. It's clear that saying "it's clear that..." usually just indicates the opposite.
- Do not compare quantities in a specific manner especially if it's done throughout the manuscript. Work on the meaning of better/worse (see above) rather than high/low which doesn't tell the reader what is better and what is worse.
Clarity
Do
- Do always concisely summarise the message of your tables and your figures at the end of their captions. This should be done so that someone seeing only your tables and figures should be to able to understand your entire paper. In general, the structure of a figure or a table should be: A concise title explaining what is being shown in bold; any details absolutely necessary to understand the figure/table, but no more (definitions, etc. can be left to the text); and a summary sentence containing any conclusions (usually for results). The abstract combined with the figures and tables and their corresponding captions taken by themselves should be adequate to understand the entire work being described without needing to reading the rest of it.
- Do be quantitative and precise and focus on conveying conceptual understanding, while eschewing vague words and phrases and limiting jargon to the appropriate section (methods sections are where a lot of jargon belongs especially relating it to the conceptual activities/understanding). Check out an example of a supremely vague abstract from a real publication.
- Create a separate description of columns and avoid using short column headings altogether. Column descriptions can either be at the top of an ASCII table or in a separate README file.
- Do use long descriptive filenames for tables and figures.
- When comparing quantities, it is best to define what better/worse (or best) is early on/once and then use better/worse throughout instead of high/low or low/high (in the case of energy or RMSD for instance). Like with everything, there may be a situation where this rule doesn't work
Do not
- Do not have tables and tables of numbers. All tables should be converted in descriptive figures, and the table should be made available as supplementary material. It's rarely necessary to use more than 2-4 significant figures after the decimal point; use the minimum number of significant figures that you need to convey the message.
- Do not use obtuse headers such as "std-full" or "rankcor" in ASCII tables. It is better to explain the columns either at the top of the file or in a separate file.
- Do not use "x axis" when describing graphs. Just say what is plotted against what. Worst case you use "horizontal axis" and "vertical axis"
- Do not have widows and orphans (i.e., single lines or section headings by themselves on a page).
- Do not have this style of writing: "N/M corresponds to A/B" where N and M are usually numbers. This takes more time for the reader to figure out. Try something like "It takes N to do A and M to do B". It's almost always worth spelling it out. But this works best only when all numbers are >= 21; if they are smaller, the current rule is to use words for numbers, but this may be an exception or handled as a special case.
- Do not overuse (and avoid in general) words and phrases like "many", "sometimes", "often", "clearly", "of course", "obviously", and "and so on". Check out an example of a supremely vague abstract from a real publication.
Consistency
Consistency is extremely important. Some issues I've come across so far include:
- Acronyms: When you first use an acronym, first write out its full name as it is commonly used; for example: "Critical Assessment of protein Structure Prediction methods (CASP)". You may choose to repeat it for every large section you have (i.e., results, methods, etc.) but it needs to be done consistently.
- Spelling: English vs. American. I use English spelling and most of you use American spelling. It doesn't matter what we use. Generally, if we submit to a British Journal (including Nature and JMB) the manuscript will adhere to English spelling and will be changed accordingly. If we submit it to an American journal, it will be American spelling. The final draft of a manuscript should have consistent spelling (either English or American---the typesetters will get it right).
- Consistency of tense: If there's something that generally holds true always (i.e., "Contacts are compiled from a set of known structures in the PDB"), then it may be better to not use the past tense. Generally, when talking about a methodological description (say of an implementation) that everyone who reads your paper has to do, present tense can work. When you're talking about an experiment you did, past tense is better.
- Hyphenation and dashes: There is generally no need to hyphenate most words. I've realised that stating "well characterised" is as communicable as saying "well-characterised". So for consistency's sake, it's better to not use hyphenation whenever possible. Please check this as you write. The LaTeX conventions of "---" to represent a dash (to separate within a sentence), "--" for attribution (i.e., "--Ram"), and "-" for a hypen (to separate words) is to be used.
- Use of numbers as letters or digits: The main rule is that anything less than or equal to twenty should be written as a word (since it's a single word up to that point). Anything else is written as a number. This doesn't always read well aesthetically, so a slight modification to this rule is that if you're enumerating something that is mixed ("6 compounds", "10 structures" initially and then later "30 compounds", "40 structures"), then you could use digits if done consistently AND if there is a greater than 21 number as part of these enumerations (i.e., in one context, there is a "six" but in the other there is "30"). If you're only referring to items only once (like three studies), then you should use a word, along with the >= 21 rule. There is no automated solution to this problem and it really does depend on context a lot. No matter what, do not start a sentence with a digit. If you're using labels or fractions, then you can use digits rather than words, i.e., "18/1024" or "3.5.2021".
- Capitalisation: Generally, only the first letter of each word is capitalised for important things such as section headings or titles of publications.
Fonts
- Perhaps the most important style suggestion regarding fonts is using logical, as opposed to physical, styles for the actual code specifying the desired formatting. The use of logical styles imparts a semantic meaning onto your text, which makes you a better writer. You end up focussing on the meaning of what you're writing instead of what it looks like. For example, in HTML, instead of using the <i> or <b> elements for specifying objects that are typically rendered with italics or bold, you can use:
- <em> for emphasis (
\em
in LaTeX); - <cite> for specifying a journal name;
- <var> for specifying a variable name;
- <strong> for strong emphasis.
- <em> for emphasis (
- The above are just some common examples. A better description of logical vs. physical styles is available elsewhere.
- Punctuation: Quotes and font modifications, such as italics, always go inside the punctuation. See the HTML source of this: "example." In other words: generally put punctuations outside of formatting, i.e., only the actual text needs to be under the format. For example, the full stop should go outside the formatting quotes used for ‘‘italic text‘‘.
- Organism names. Should be italicised and in the proper form. Genus should have a capital first letter and species should not. Genus name can be abbreviated with the first letter only and a full stop. That is, Plasmodium falciparum and P. falciparum are acceptable if used in a consistent manner. Usually the full name is for the first usage and subsequent usages can have the abbreviated name.
- Language other than English (i.e., Latin). Should be italicised if it's really not accepted in English. "in vitro" is italicised, but "et al." is not.
Formatting
- Headings should have only the first letter capitalised for certain and the rest should follow regular writing style.
- Try to use headings hierarchically and logically. For example, in wiki, HTML, and LaTeX: "=", "<h1>", "\section" is the top level heading to use (which should be the default), "==", "<h2>", and "\subsection" is the second level, "===", "<h3>", and "\subsubsection" is the third level, and so on. The bold text style is good for nonhierarchical headings.
Strings
Dates
- "September 6, 2010" is the long format. When padding makes sense (such as in a fixed width font situation, be it in a manuscript or a web page), padding with a 0 will result in "September 06, 2010". Unix date command: no padding
date '+%B %e, %Y'
with padding:date '+%B %d, %Y'
- Sep 6, 2010 is the medium format; Sep 06, 2010 or Sep x6, 2010 when padded with "0" or "x" respectively. Comma may be excluded for aesthetic or other reasons. Unix date command: no padding:
date '+%b %e, %Y'
with padding:date '+%b %d, %Y'
- sep062010 (mmmddyyyy) is currently the most standard convention for dates appearing as strings, such as with filenames. Default padding makes sense here.
date '+%b%d%Y' | tr '[A-Z]' '[a-z]'
- Sep062010 (Mmmddyyy) has also been used but avoid at all costs. It was considered rarely for aesthetic or other reasons, for example, when capitalising in filenames such as Foo.Sep062010 instead of Foo.sep062010. This is Ram's pedantry at its worst.
date '+%b%d%Y'
- Sep 6, 2010 is the medium format; Sep 06, 2010 or Sep x6, 2010 when padded with "0" or "x" respectively. Comma may be excluded for aesthetic or other reasons. Unix date command: no padding:
- See publishing and the grant writing suggestions for even more on how to present written work. We obviously feel the more well you write damned be context, it is for the better.
- For formatting publications, please be sure to use the same format used for our publication list.
CANDO specific style and terminology use
Over the decades, we have adopted terminology (and style, note the fonts) to refer to CANDO and its components as follows:
- The CANDO platform consists of pipelines that generates a compound-proteome interaction matrix and indication-specific protocols to rank compounds for particular diseases/indications given particular proteomes/interactomes/heterogeneous data. These are the main phrases and should be used extensively.
- We also defined a module to refer to a protocol that is its software equivalent, i.e, the
canpredict
andcanbenchmark
modules incando.py
refer to the prediction and benchmarking protocols within a given pipeline within the CANDO platform.
- The generic words "component" and "algorithm" are used rarely as a substitute for protocol and/or module. The reasons are the same as below.
- The words "method" and "approach" are rarely to spiff up the writing, or for rhetorical purposes to replace many of the terms above. These more generic phrases that are used to avoid repeating words in a sentence, over use of words like platform, pipeline, and protocol, etc.
- Typically if bolding is used for the terms above, then it is done so only for the first usage, not every instance. However, every usage of fixed width font above (i.e., for
cando.py
) is used throughout.
- Also when describing CANDO, focus on the conceptual meaning (i.e., the protocol) in describing a technical module throughout a manuscript/grant application/poster/presentation/etc. For instance, "the benchmarking module in the Vina pipeline" vs. "canbenchmark". That is, avoid unnecessary jargon. However, in the methods sections, you can refer to the module proper to connect to the protocol to the module name, but this should typically be done only once.
External links
Avoid external links (URLs outside of the wiki hierarchy) in wiki articles. If you must use them, try to relegate it to a separate section, such as this one. These are pages/sites with some good style ideas, but not all (some don't practice what they preach), so take the best and leave the rest.