› Recursive Wordcount 
Sat, 07/22/2006 - 03:39
Yes, I am a Linux user and I really appreciate the freedom I get by using either an awesome desktop environment or the command line - or both!
The function I’m going to present you gives you a good overview of how many words, lines and bytes files in a given folder have. I’m speaking about wc.
The bash script I wrote applies wc to every web file in a folder and every sub folder. Web files are:
*.php*.html / *.htm*.tpl*.sql*.js*.css.htaccess- files without an extension (e.g.
README)
Furthermore you can exclude folders by using the -e parameter. I needed that feature to exclude scripts which are not written by me (see below). But because of that I have had to commit a sin: using eval…
Recursive wc
#!/bin/bash # usage # -s = search path # -e = excluded paths # examples: # current folder: ./wc.sh # other folder: ./wc.sh -s ../foobar/ # exclude folder: ./wc.sh -e "*/foobar/*" # exclude folder: ./wc.sh -e "*/folder1/*" -e "*/folder2/*" # default params SEARCH_PATH="./" EXCLUDE="" # read command line params while getopts "s:e:" PARAM do case "${PARAM}" in s) SEARCH_PATH="$OPTARG";; e) EXCLUDE=$EXCLUDE" -not -path \"$OPTARG\"";; esac done if [ "$EXCLUDE" = "" ] then find $SEARCH_PATH \ -regextype posix-egrep \ -type f \ -regex ".*(\.(php|html?|tpl|css|sql|js))$" \ -or -name ".htaccess" \ | xargs wc else # evil eval eval 'find $SEARCH_PATH \ -regextype posix-egrep \ -type f \ \( -regex ".*(\.(php|html|tpl|css|sql|js))$" \ -or -name ".htaccess" \)\ '$EXCLUDE' \ | xargs wc' fi
How I use this script
I use it in the following way to get an idea on just how much code I wrote for 3co:
~/projects/3co$ ./wc.sh -e "*/tinymce/*" \ -e "*/classes/mail/*" \ -e "*/classes/markdown*" \ -e "*/classes/smartypants*" \ -e "*/classes/agent*" \ -e "*/3co/compress*" 17789 56608 630733 total
Woha - 17789 lines, 56608 words with a total of 630733 bytes.
Note
An additional feature of wc is that it displays a warning if it encounters encoding problems. I really like that because that way I can make out possible encoding bugs quite simple.
Comments
Under OSX: wc -l `find * Tue, 09/06/2011 - 02:06 — Michele Antolini (not verified)
Under OSX:
or wc -l
find * -name "*.java" -or -name "*.html"i.e. just put double quotes ( ” ) around filter expression after -name
Leaving aside the fact that Wed, 03/25/2009 - 12:40 — Gastón Fournier (not verified)
Leaving aside the fact that this line doesn’t encounter encoding problems, it did the job of counting the lines of code:
You can filter the files changing the find command, for example:
Pretty much the same I do. My Wed, 03/25/2009 - 22:38 — Milian Wolff
Pretty much the same I do. My script just has some additional features (esp. excluding stuff).
Post new comment