Recursive Wordcount

Yes, I am a Linux user and I really appreciate the freedom I get by using either an awesome desktop environment or the command line - or both!

The function I’m going to present you gives you a good overview of how many words, lines and bytes files in a given folder have. I’m speaking about wc.

The bash script I wrote applies wc to every web file in a folder and every sub folder. Web files are:

Furthermore you can exclude folders by using the -e parameter. I needed that feature to exclude scripts which are not written by me (see below). But because of that I have had to commit a sin: using eval

Recursive wc

    # usage
    # -s = search path
    # -e = excluded paths
    # examples:
    # current folder:    ./
    # other folder:      ./ -s ../foobar/
    # exclude folder:    ./ -e "*/foobar/*"
    # exclude folder:    ./ -e "*/folder1/*" -e "*/folder2/*"
    # default params
    # read command line params
    while getopts  "s:e:" PARAM
        case "${PARAM}" in
                s)      SEARCH_PATH="$OPTARG";;
                e)      EXCLUDE=$EXCLUDE" -not -path \"$OPTARG\"";;
    if [ "$EXCLUDE" = "" ]
        find $SEARCH_PATH \
            -regextype posix-egrep \
            -type f \
            -regex ".*(\.(php|html?|tpl|css|sql|js))$" \
            -or -name ".htaccess" \
            | xargs wc
        # evil eval
        eval 'find $SEARCH_PATH \
                -regextype posix-egrep \
                -type f \
                \( -regex ".*(\.(php|html|tpl|css|sql|js))$" \
                -or -name ".htaccess" \)\
                '$EXCLUDE' \
                | xargs wc'

How I use this script

I use it in the following way to get an idea on just how much code I wrote for 3co:

    ~/projects/3co$ ./
      -e "*/tinymce/*" \
      -e "*/classes/mail/*" \
      -e "*/classes/markdown*" \
      -e "*/classes/smartypants*" \
      -e "*/classes/agent*" \
      -e "*/3co/compress*"
        17789  56608 630733 total

Woha - 17789 lines, 56608 words with a total of 630733 bytes.


An additional feature of wc is that it displays a warning if it encounters encoding problems. I really like that because that way I can make out possible encoding bugs quite simple.


Want to comment? Send me an email!

Comment by Michele Antolini (not verified) (2011-09-06 02:06:00)

Under OSX:

    wc -l `find * -name "*.*"`

or wc -l find * -name "*.java" -or -name "*.html"

i.e. just put double quotes ( ” ) around filter expression after -name

Comment by Gastón Fournier (not verified) (2009-03-25 12:40:00)

Leaving aside the fact that this line doesn’t encounter encoding problems, it did the job of counting the lines of code:

    wc -l `find * -name *.*`

You can filter the files changing the find command, for example:

    wc -l `find * -name *.java -or -name *.html`
Comment by Milian Wolff (2009-03-25 22:38:00)

Pretty much the same I do. My script just has some additional features (esp. excluding stuff).

Published on March 25, 2009.