Code SnippetsRecursive Wordcount Syndicate content

Sat, 07/22/2006 - 03:39

Yes, I am a Linux user and I really appreciate the freedom I get by using either an awesome desktop environment or the command line - or both!

The function I’m going to present you gives you a good overview of how many words, lines and bytes files in a given folder have. I’m speaking about wc.

The bash script I wrote applies wc to every web file in a folder and every sub folder. Web files are:

  • *.php
  • *.html / *.htm
  • *.tpl
  • *.sql
  • *.js
  • *.css
  • .htaccess
  • files without an extension (e.g. README)

Furthermore you can exclude folders by using the -e parameter. I needed that feature to exclude scripts which are not written by me (see below). But because of that I have had to commit a sin: using eval

Recursive wc
  1. #!/bin/bash
  2.  
  3. # usage
  4. # -s = search path
  5. # -e = excluded paths
  6.  
  7. # examples:
  8. # current folder: ./wc.sh
  9. # other folder: ./wc.sh -s ../foobar/
  10. # exclude folder: ./wc.sh -e "*/foobar/*"
  11. # exclude folder: ./wc.sh -e "*/folder1/*" -e "*/folder2/*"
  12.  
  13. # default params
  14. SEARCH_PATH="./"
  15. EXCLUDE=""
  16.  
  17. # read command line params
  18. while getopts "s:e:" PARAM
  19. do
  20. case "${PARAM}" in
  21. s) SEARCH_PATH="$OPTARG";;
  22. e) EXCLUDE=$EXCLUDE" -not -path \"$OPTARG\"";;
  23. esac
  24. done
  25.  
  26. if [ "$EXCLUDE" = "" ]
  27. then
  28. find $SEARCH_PATH \
  29. -regextype posix-egrep \
  30. -type f \
  31. -regex ".*(\.(php|html?|tpl|css|sql|js))$" \
  32. -or -name ".htaccess" \
  33. | xargs wc
  34. else
  35. # evil eval
  36. eval 'find $SEARCH_PATH \
  37. -regextype posix-egrep \
  38. -type f \
  39. \( -regex ".*(\.(php|html|tpl|css|sql|js))$" \
  40. -or -name ".htaccess" \)\
  41. '$EXCLUDE' \
  42. | xargs wc'
  43. fi
How I use this script

I use it in the following way to get an idea on just how much code I wrote for 3co:

  1. ~/projects/3co$ ./wc.sh
  2. -e "*/tinymce/*" \
  3. -e "*/classes/mail/*" \
  4. -e "*/classes/markdown*" \
  5. -e "*/classes/smartypants*" \
  6. -e "*/classes/agent*" \
  7. -e "*/3co/compress*"
  8. 17789 56608 630733 total

Woha - 17789 lines, 56608 words with a total of 630733 bytes.

Note

An additional feature of wc is that it displays a warning if it encounters encoding problems. I really like that because that way I can make out possible encoding bugs quite simple.

Comments

Under OSX: wc -l `find * Tue, 09/06/2011 - 02:06 — Michele Antolini (not verified)

Under OSX:

  1. wc -l `find * -name "*.*"`

or wc -l find * -name "*.java" -or -name "*.html"

i.e. just put double quotes ( ” ) around filter expression after -name

Leaving aside the fact that Wed, 03/25/2009 - 12:40 — Gastón Fournier (not verified)

Leaving aside the fact that this line doesn’t encounter encoding problems, it did the job of counting the lines of code:

  1. wc -l `find * -name *.*`

You can filter the files changing the find command, for example:

  1. wc -l `find * -name *.java -or -name *.html`

Pretty much the same I do. My Wed, 03/25/2009 - 22:38 — Milian Wolff

Pretty much the same I do. My script just has some additional features (esp. excluding stuff).

Post new comment

  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options