User:LeMoyne

Purpose
as it is where it is: Created as a stub root for development, editing, proofing and review of 2 sets of pages related to Counting or Document Stats in Writer.

The Document Foundation was created by dedicated users of OpenOffice at least partly to improve the product and to make the development process more responsive to users. Several different sets of freelance writers require character counts throughout their business cycle: from estimation through measurement of progress to billing for work done. Authors in academia, government and the legal profession have different sets of rules as to what gets counted, and the counts are crucial to the final result of submission for publication in a journal or the public record. Users of LibreOffice don't want to be forced into using a different document application just to produce a count, either in general (to get a non-white count) or in specific (to produce a particular count). User postings on OOo qa and on list servers often refer to the weakness of the counting in Writer as a bar to the adoption of OpenOffice.

Translators of Chinese, Japanese and other languages likely desire far more or different counting information than a European author will want to see. Including useful support for internationalization in the count will increase the complexity. Looking at the history of development in the OOo qa bugzilla one can see evidence of regression due to 'fix/break' in tweaking the current counter. The complexity involved in a more able counting mechanism precludes the success of any piecemeal approach to development. There is a need for clarity about basic assumptions like the answer to "What is a word?" in order to determine if the counter needs to count using different definitions of "word".

Counting Documentation
LibreOffice developers are very lucky in that they work directly for and with their most committed users.

LibreOffice Dev can not effectively proceed towards making useful counts without guidance from Doc. I suggest that:
 * If Doc can produce Doc/Help that describes the way Counting is supposed to work,
 * then Dev will have more success in producing what the users actually want and need.


 * Dev can create unit tests first/simultaneously (to prevent regression)
 * Dev can avoid 'requirement guessing', 'feature creep' and that nagging feeling one is on the wrong track
 * A clear goal allows an atmosphere of freedom and surety for the implementation decisions which are the creative work of Dev

How, what and where does writer need to count?

 * How to count?
 * What is a word?
 * Does LibO need character classes beyond just white/text, like punctuation?
 * Separate from internationalization: Is there a need for user entered special countme sets?
 * Does LibO need counting to recognize words and numbers separately? Are other similar distinctions needed?
 * Other questions like: Is there a need to count bullets? outline numbering?


 * What to count?
 * What types of text need on/off controls on the count? (hidden, redline, indented quote, etc.)
 * What types of paragraphs need on/off controls on the count? (header/footer, note, hidden, redline, etc.)


 * Where to count?
 * Is a page range control sufficient to create counts excluding front/end matter? (title, ToC, glossary, index, refs, footnotes)

What data and controls are necessary to make counting useful to the broadest possible spectrum of users?

 * What other office applications are considered the standard? What comes closest to making everyone happy... and why?
 * user classes: Individual, Self-employed, Contractor, Corporate, Government
 * use cases: Translation, Journalism, Legal, Medical


 * What are the crucial, 'must-have' functions and what features are non-essential?
 * For each user class:
 * To cover all classes:


 * Answers to other questions Dev may never think to ask:
 * Do users need a similar/corollary function for Calc? Impress?  (best to work all coherently to avoid bloat/mis-duplication)
 * Should Counting relate to Find/Replace?

Essentially all of the questions in the list have basic or profound effects on the design of the counting software.
A few examples:
 * 1) If just a non white character count is enough then Dev is nearly done, if not then Dev is near done with a band-aid yet back before square one in Counting development.
 * 2) LibO will require more work towards internationalization of character sets if LibO will count occurrences of a built-in punctuation character set.  It seems some internationalization is already required for the counting of white space.  But to go from built-in white to built-in punctuation to user-defined sets may require more complexity in the counting mechanism and it's internationalization.
 * 3) LibO may need to use regular expressions to maintain speed if it needs to recognize numbers as distinct from words or if it gets complex with controls/options.

Counting Development
LibreOffice developers are very lucky in that they work directly for and with their most committed users.

To organize Counting development in a distributed, weakly managed team, the Counting Development page is offered as a hub to collect all the various bits and threads of internal information that go into producing a Count.

A draft list of some of the subsections:

context:

 * Basic info on what it is now and what that implies about what has to change and what it has to be to work within in LibO to produce timely, efficient counts.
 * What is Count/Counting through doc lifecycle Create (init), Load/Save (read=write=what data?), Edit (bkgd update) and export to different formats (save other data?)
 * Complete list of places to visit/modify in giving the counts through UI

design:

 * Draft and final feature lists
 * Design sketch of where to apply the various possible controls (Doc, TxtNode, actual scan/parse of mText).
 * The way it copies the string three (3) times to count white vs non-white and words. Reduce to zero?!? while doing much more...

questions:

 * Creation of new class(es) with TDF copyright - is this possible? encouraged or discouraged?

QA:

 * Make sure different views show same result (Word Count Dialog, Doc Stats, ?!?Other Stats).
 * Unit tests needed. Cppu ref to example.