

                        W O R D C O U N T


A Text-Analysis Tool for WordStar, WordStar 2000, and ASCII files

                    Program by Richard Zuris
      Program design by Richard Zuris and Robert J. Sawyer
                Documentation by Robert J. Sawyer

                          Version 3.03
                          25 June 1992


=================================================================





TECHNICAL SUPPORT AND LICENSE


The WORDCOUNT program file, WC.EXE, is copyright 1990-1992 by 
Richard Zuris.  All rights reserved.

This documentation file is copyright 1990 and 1991 by Robert J. 
Sawyer.  All rights reserved.

There is no charge for use of WORDCOUNT, but you must not sell it 
or bundle it with other products.

WORDCOUNT technical support is available through Section 15 
(Third Party/Addons) of The WordStar Forum on the CompuServe 
Information Service.  

                     Richard Zuris 76702,520
                   Robert J. Sawyer 76702,747


 THIS PRODUCT IS PROVIDED "AS IS" AND WITHOUT WARRANTIES OF ANY 
 --------------------------------------------------------------
 KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES 
 -----------------------------------------------------------------
 OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.  THE 
 ------------------------------------------------------------
 AUTHORS ARE NOT LIABLE FOR ANY INJURY, LOSS, DAMAGE, OR EXPENSE, 
 ----------------------------------------------------------------
 WHETHER DIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL.
 ------------------------------------------------------



PROGRAM REQUIREMENTS

WORDCOUNT runs only under the MS-DOS or PC-DOS operating systems.  
It is small enough that most users should be able to run it 
successfully from WordStar's or WordStar 2000's DOS gateways.

WORDCOUNT works with document files from any current version of 
WordStar (tested through 7.0 Revision B) and WordStar 2000 
(tested through 3.5) and with pure ASCII files, such as those 
produced by WordStar non-document or with WordStar 2000's 
unformatted format.  





OVERVIEW


In WordStar Release 5.0 and above, the ^K? command gives a count 
of the number of words in a file.  In WordStar 2000, ^O= performs 
a similar function.  Although these are useful features, they 
have several shortcomings:

* There's no easy way to record the word count.

* There's no easy way to perform a word count on a series of 
  files.

* They provide only a count of actual grammatical words, not 
  typesetter's words (defined below), which is what writers 
  usually really need to know.

* They total either all the words in the entire file, or just 
  those in a single marked block, but in WordStar there's no way 
  to mark sections to be ignored in the word-count tally.

* WordStar's byte count can be substantially inflated over the 
  actual number of printing characters (for instance, a tab, 
  which one might reasonably think of as a single character, 
  ASCII ^I, or five characters, the number of spaces a default 
  WordStar tab takes up on screen, is actually counted by ^K? as 
  13 characters, because that's the number of bytes in the 
  symmetrical sequence tab code).  WordStar 2000 doesn't provide 
  a byte count at all.

WORDCOUNT addresses all of these difficulties.  

If you activate WC from the DOS command line without any 
parameters, you'll see its help screen and copyright notice:


WORDCOUNT v3.03 for WordStar 2000, WordStar, and ASCII files
Copyright 1990-1992 by Richard Zuris.  All rights reserved.

  Enter "WC" and one or more filenames (may include path and wildcards).
  Files may contain .WC ON/OFF/2 (WordStar or ASCII dot command or WS2000
  [COMMENT]) to enable/disable/double the count for selected text.  Options:

    /T=   Number of characters in a typesetter's word (default: 6.0).
    /C=   What characters (ASCII 128-255) to consider part of a word.
            M:  Multilingual (Code Page 850)
            R:  Hewlett-Packard Roman-8 character set
            A:  All extended characters (ASCII 128-255)
            Default:  only accented letters from the IBM ECS (CP437)
    /I=   Number of spaces to count for each tab (def. 5 for ASCII/WS2000).
    /S    Stagger output (aids WordStar's block math).
    /W    Do not include whitespace in character count.
    /E    Count each press of the Enter key as one "space".
    /N    Count footnotes, endnotes, and annotations.
    /G    Show grand totals if more than one file counted.
    /H    Count headers and footers.
    /U    Count lines with software underscore as two lines each.
    /D    Recursively check subdirectories for matching filenames.





RUNNING WORDCOUNT

To run WORDCOUNT, enter WC on the command line followed by one or 
more valid DOS file names with or without wildcards.  You may 
optionally include drive and subdirectory specifications.  If you 
append the /D switch, discussed below, WORDCOUNT will check files 
in the specified subdirectory and in any subdirectories beneath 
it.  

For instance:

WC *.*                  check all files in the current
                        directory

WC DIARY.DOC            check the file DIARY.DOC in the
                        current directory

WC DIARY.DOC TRIP.DOC   check the files DIARY.DOC and TRIP.DOC
                        in the current directory

WC *.DOC                check all files with the extension .DOC
                        in the current directory

WC ABC??                check all files in the current directory 
                        with names three to five characters long, 
                        the first three of which are ABC

WC C:\ARTICLES\*.*      check all files in the \ARTICLES
                        subdirectory on drive C:

WC C:\ARTICLES\*.* /D   check all files in the \ARTICLES 
                        subdirectory and in any
                        subdirectories beneath \ARTICLES

WC C:\DIARY.DOC /D      find and check DIARY.DOC, no matter
                        what subdirectory it is stored in on
                        drive C:  (This could take a while. 
                        WORDCOUNT does not stop searching
                        after the first occurence of the
                        filename.  You can interrupt its
                        searching by pressing <Esc>.)

If a file or path specified doesn't exist, WORDCOUNT will give 
you an error message.


WORDCOUNT will respond with a screen display like this for each 
file checked (the line numbers are just for reference in this 
manual):
        
                                                                   
1|    WORDCOUNT v3.03 for WordStar 2000, WordStar, and ASCII files
2|    Copyright 1990-1992 by Richard Zuris.  All rights reserved.

3|    WordCount for C:\SF\NOVELS\GOLDEN
4|    Sun Apr 20 10:10:10 1991

5|      Printable characters and spaces:    362,110
6|      Printable lines:                      7,343
7|      Grammatical words:                   60,187
8|      Typesetter's words (chars/6.0):      60,351

Lines 1 and 2 of the display identify the program and the version 
number.  If you contact us for technical support, please tell us 
the version number.  

Line 3 identifies the file being analyzed.  The full 
drive\subdirectory designation will always appear here even if 
you only typed the filename on the command line.

Line 4 indicates the time and date on which you ran WORDCOUNT.  
(This is not the file date/time stamp from the DOS directory; 
rather, it's so you can keep records of what the status of a 
particular document was at a particular moment).

Line 5, "Printable characters and spaces," indicates how many 
letters, numbers, punctuation marks, real spaces (those you 
entered into the file by hitting the space bar), and horizontal 
printer spaces produced by tabs there are in the file.  

Normally, carriage returns, line feeds, and print controls (such 
as WordStar's ^B for boldface) are not counted in this tally 
(although you can tell WORDCOUNT to count each hard carriage 
return as a space if you use the /E switch, described below).  A 
backspace character (^H) will cause the next character to be 
ignored, so that an accented character produced by overprinting a 
letter and a punctuation mark will be counted as one character.}

Line 6, "Printable lines," is the number of text lines that will 
be printed, not including top and bottom margins, header or 
footer lines, soft lines (such as those blank ones produced by 
using WordStar's .LS command), or suppressed lines (those kept 
from printing by the .SB command, new in WordStar 6.0D).  

This piece of data is more important than it might appear at 
first glance.  Some publications use the method of word counting 
known as "printer's rule."  This really measures the amount of 
space a document will take up, not the discrete number of words.  
"Printer's rule" recognizes that even a one-word paragraph of 
dialogue takes up the same amount of space on a typeset page as a 
full line of text.  If you use WordStar's standard 6.5" line and 
a ten-pitch fixed-width font, you will average ten 
six-character-long typesetter's words per line.  That means 
multiplying the "Printable lines" tally by 10 will give you a 
word count according to the "printer's rule."

Line 7, "Grammatical words," provides a tally of the number of 
actual English words in your document.  WORDCOUNT tries to 
determine if a group of characters containing a hyphen is a 
single word, such as "bi-coastal," or two or more words joined 
together in an adjectival phrase, such as "word-processing 
software."  To do this, WORDCOUNT looks at the characters 
preceding the hyphen.  If a word separator (a space or 
punctuation such as quotation mark) is found within four 
characters of the hyphen, then WORDCOUNT assumes that the hyphen 
has been used within a single word to distinguish a prefix.  This 
allows words with prefixes up to three letters long, such as 
"pre-Columbian," to be counted as a single word.  Also, since 
WORDCOUNT allows numerals within words, it also means that phone 
numbers such as 555-1212 will be counted as one word.  

Line 8, "Typesetter's words," tells you how many groups of five 
letters plus one space could be printed in the area taken up by 
the text in your document.  Typesetter's words, not grammatical 
words, are the standard used for word counting by most 
publications.

As an example of the difference between typesetter's and 
grammatical words, consider the following sentence:

            Archaeopteryx flourished in the Jurassic.

It's five grammatical words but seven typesetter's words.  By 
default, WORDCOUNT considers a typesetter's word to be 6.0 
characters.  You can use the optional /T= switch, described 
below, to specify a different length. 

If you specify the optional /G switch, described below, a grand 
total display for all the files checked will also be provided.




PAGING THROUGH WORDCOUNT'S ANALYSIS

If you use WORDCOUNT to analyze several files at once, the 
WORDCOUNT display will scroll off your screen.  You can pipe the 
WORDCOUNT analyses through the MS-DOS or PC-DOS MORE.COM program 
so that your computer will pause between each screen of 
information (displaying "-- More --" at the bottom of your 
screen).  Pressing any key will move you to the next screen.  The 
command below will check all files in the current directory, 
pausing after each full screen of information:

                          WC *.* | MORE





SAVING WORDCOUNT'S ANALYSIS


WORDCOUNT's analysis can be sent to a file instead of the screen 
using standard DOS redirection.  The command:

                       WC FILENAME > TOTAL

will save WORDCOUNT's analysis of the document FILENAME to the 
file TOTAL (if TOTAL already exists, the old version will be 
overwritten; if it doesn't already exist, TOTAL will be created).  
You can include a drive\subdirectory designation for either or 
both of FILENAME and TOTAL, if you like.

As mentioned above, you can specify multiple files on the WC 
command line.  Or, if you prefer, you can run WC individually for 
each file you wish to check, then append all the analyses to a 
single file (note that you use >> instead of > to append output):

                        WC FILE1 >> TOTAL
                        WC FILE2 >> TOTAL
                        WC FILE3 >> TOTAL

These commands can be placed in a batch file. You could write 
such a batch file with WordStar's non-document mode or with 
WordStar 2000 using the NOFORMAT.FRM format, or you could have 
ProFinder, the DOS shell included with WordStar 5.0 and above, 
make the batch file for you.  

To do that, call up ProFinder in the subdirectory containing the 
files you want to analyze.  In the ProFinder file list, cursor to 
the files you wish to check, and highlight each one by issuing 
^K.  Next, issue <Alt-F> to activate the FILES menu and select W 
to "Write Filenames."  ProFinder will present this dialog box:

       ------------- Write tagged filenames --------------
       | Write filenames to:                             |
       | Prefix:                                         |
       | Suffix:                                         |
       ---------------------------------------------------

To create a batch file called COUNT.BAT, fill in the prompts as 
follows:

       ------------- Write tagged filenames --------------
       | Write filenames to: COUNT.BAT                   |
       | Prefix: WC                                      |
       | Suffix: /S >> TOTAL                             |
       ---------------------------------------------------

(/S is an optional WORDCOUNT command-line switch, discussed 
below; other WORDCOUNT command-line switches can be placed on the 
"Suffix" line, too.)

ProFinder will then create the file COUNT.BAT for you.





COMMAND-LINE SWITCHES


WORDCOUNT recognizes several optional command-line switches.  
These can by typed before or after the name of the file to be 
analyzed, but must appear before any DOS redirection command 
(such as > or >>) or pipes (such as | MORE).  You can use as many 
switches as you like, but they must be separated by slashes (/) 
and, optionally, by spaces.  Switches that take numerical 
arguments only accept numbers less than 100.  


/T=      This switch sets the number of characters (letters plus 
         spaces) in each typesetter's word.  The default, if the 
         /T= switch is not used, is 6.0 characters.  You may 
         specify a single decimal place, if you like.  

/C=      WORDCOUNT always considers all unaccented letters and 
         numerals as part of a word.  However, the /C= switch 
         lets you specify which additional characters are 
         considered part of a word.  By default, when the /C= 
         switch is not used, WORDCOUNT also includes accented 
         characters from Code Page 437, the North American IBM 
         extended character set (ASCII 128 to 165). 

         If you use /C=M, WORDCOUNT will assume that you are 
         using Code Page 850 (which is fully supported by 
         WordStar 6.0 and above), commonly used in Europe, and it 
         will count all accented letters from this Code Page as 
         part of words.  

         If you use /C=R, WORDCOUNT will assume that you're using 
         the Hewlett-Packard Roman-8 character set (popular among 
         laser printer users).

         If you use /C=A, WORDCOUNT will assume that all extended 
         characters (ASCII 128 to 255) are parts of words.

/I=      This switch sets the number of printable characters 
         counted for each ASCII or WordStar 2000 tab (09 hex or 
         ^I) or symmetrical sequence tab (entered in WordStar 
         when you hit the tab key).  For WordStar 2000 and ASCII 
         files, the default is 5, but using the /W switch, below, 
         will set this to zero.  WordStar document tabs have 
         coded within them a count of the number of onscreen 
         spaces that they traverse, and unless you use the /I 
         switch, all WordStar tabs will be calculated based on 
         their actual widths.  For WordStar 2000 and ASCII files, 
         you must choose one value for the number of spaces to 
         count for all tabs, even if some are different widths.  
         If you use the /I switch on a WordStar file, all tabs 
         will be calculated at the single value you specify, 
         regardless of their actual widths.  

/E       Normally, WORDCOUNT ignores hard carriage returns, but 
         this switch tells it to count one space for each hard 
         carriage return encountered in the text.  Regardless, 
         carriage returns at the ends of dot-command lines are 
         always ignored, as are the soft carriage returns at the 
         ends of word-wrapped lines.  

/W       This switch tells WORDCOUNT not to include whitespace 
         (spaces created by hitting the space bar or tabs) in its 
         analysis.  Note that because this switch causes 
         WORDCOUNT to ignore hard spaces, it will result in an 
         inaccurate count for typesetter's words.  

/N       This switch tells WORDCOUNT to include the text in 
         footnotes and endnotes in its analyses.  By default, 
         such text is ignored.  

/H       This switch tells WORDCOUNT to include the text in 
         headers and footers in its analyses.  By default, such 
         text is ignored.  If the /H switch is used, headers and 
         footers are counted for each page on which they appear.  
         A two-word header on each page of a 50-page document 
         will add 100 grammatical words to WORDCOUNT's total.  

/U       This switch tells WORDCOUNT to count each line with a 
         software underscore (^PS in WordStar) as two lines.
         
/D       This switch tells WORDCOUNT to also check subdirectories 
         beneath the subdirectory you specify.  For example, 

                          WC \DOC\MYFILE.DOC /D

         would check MYFILE.DOC whether it was located in the 
         \DOC subdirectory, or in a subdirectory below \DOC.

/G       This switch tells WORDCOUNT to produce a grand-total 
         display for all files checked:

              Totals:                                      
                                             
                Printable characters and spaces:     33,336
                Printable lines:                      1,182
                Grammatical words:                    5,589
                Typesetter's words (chars/6.0):       5,556


/S       This switch staggers WORDCOUNT's output so that if the 
         analyses for multiple documents are redirected to a 
         single file, you will be able to use column block math 
         to total any one particular category.  If you use 
         WordStar, turn on column blocking (^KN), mark the column 
         of figures for which you want a cumulative total, and 
         issue ^KM.  To copy the displayed total into the 
         underlying file, issue <Esc>=.  (For automatic tallying, 
         instead use the /G switch, described above.)

         If you use WordStar 2000, turn on column blocking (^BV), 
         mark the column of figures for which you want a 
         cumulative total, and issue ^BA.  

         (Note that WORDCOUNT's output pads blank lines with 
         spaces so that WordStar's default vertical cursor 
         movement routine won't reset the cursor to column 1 if 
         you pass over an apparently blank line.)






SKIPPING OR DOUBLE-COUNTING TEXT


There may be parts of a document you don't want included in 
WORDCOUNT's analysis.  For instance, a magazine journalist is 
interested only in the tally for the actual text of his or her 
article, not the copyright notice at the top of the article or 
the list of contacts at the bottom.  Or, there may be a portion
of text that you wish to count twice, such as an address that 
will also print on an envelope in a separate file.

WORDCOUNT checks documents for a special dot command:

                             .WC OFF

When this is encountered, WORDCOUNT's analysis will be turned 
off.  Like all WordStar dot commands, you can type the text in 
any combination of upper and lower case characters and the space 
between the command (.WC) and the argument (OFF) is optional; the 
number 0 is an acceptable synonym for "OFF."

To turn WORDCOUNT's analysis back on (which is the default 
condition), use:

                             .WC ON

The number 1 is an acceptable synonym for "ON."

WordStar will show a question mark in the flag column opposite 
the .WC command; this will not affect your printout or editing 
functions in any way.

To double the counts for a section of text, use:

                              .WC 2

This will cause WORDCOUNT to double the word, line, and character 
counts for the following portion of text.  Use .WC ON to return 
to normal counting.

For WordStar 2000, the WC ON and OFF commands may be entered
as non-printing comments with ^OU.  The same syntax described 
above applies, but the leading period is optional.





USING WORDCOUNT FROM WITHIN WORDSTAR


You may wish to use WORDCOUNT while editing with WordStar or 
WordStar 2000.  Since WORDCOUNT is a separate program, you may 
run it from WordStar's ^KF or WordStar 2000's ^OG DOS gateways.  
However, to get an analysis of the file you're currently editing, 
you must first save the file with ^KS in WordStar or ^QC with 
WordStar 2000 (otherwise WORDCOUNT's tallies will reflect the 
version on disk, not the one in memory) and then retype the 
filename.  

Retyping the filename is irritating.  A better solution might be 
to have a macro that simply writes the entire contents of the 
current file out to a standard filename and then runs WORDCOUNT 
against that filename.  For WordStar users (versions 4.0 to 6.0), 
this Shorthand macro will do just that:

^K0^F^Kb^Kk^Qr^Kb^Qc^Kk^KwC:\COUNT.ME^My^Kh^KfWC C:\COUNT.ME^M

For WordStar 7.0 and above, use:

Sub Main
    Key("^K0^K<^Qr^Kb^Qc^Kk^KwC:\COUNT.ME^K")
    IfException
        ACK: Key("y")
    End IfException
    Key("^KfWC C:\COUNT.ME^K")
End Sub

After WORDCOUNT's analysis is displayed, press any key to 
continue editing and then, if you like, ^Q0 to jump back to the 
place in the text at which you had invoked the macro.  

(The "y" in the Shorthand macro is only needed at help level 2 
or above; it tells WordStar to overwrite the previous 
C:\COUNT.ME file.  If you use such a help level, the very first 
time you run it, this macro won't work properly because 
C:\COUNT.ME won't already exist.  The IfException in the WS 7
macro handles either case properly.)

Unfortunately, there is no WordStar 2000 equivalent for this 
macro, because you cannot pass a command-line to its DOS shell.





ABORTING WORDCOUNT

If you wish to abort WORDCOUNT before it has finished processing, 
press <ESC>.  WORDCOUNT will exit to DOS with the message 
"Aborted."  If WORDCOUNT is searching for a file, it won't abort 
until it begins to read the file, so it may be a short time after 
you press <ESC> before the program stops.



                             --END--
