Recursive Wordcount
Yes, I am a Linux user and I really appreciate the freedom I get by using either an awesome desktop environment or the command line - or both!
The function I’m going to present you gives you a good overview of how many words, lines and bytes files in a given folder have. I’m speaking about wc
.
The bash script I wrote applies wc
to every web file in a folder and every sub folder. Web files are:
*.php
*.html / *.htm
*.tpl
*.sql
*.js
*.css
.htaccess
- files without an extension (e.g.
README
)
Furthermore you can exclude folders by using the -e
parameter. I needed that feature to exclude scripts which are not written by me (see below). But because of that I have had to commit a sin: using eval
…
Recursive wc
#!/bin/bash
# usage
# -s = search path
# -e = excluded paths
# examples:
# current folder: ./wc.sh
# other folder: ./wc.sh -s ../foobar/
# exclude folder: ./wc.sh -e "*/foobar/*"
# exclude folder: ./wc.sh -e "*/folder1/*" -e "*/folder2/*"
# default params
SEARCH_PATH="./"
EXCLUDE=""
# read command line params
while getopts "s:e:" PARAM
do
case "${PARAM}" in
s) SEARCH_PATH="$OPTARG";;
e) EXCLUDE=$EXCLUDE" -not -path \"$OPTARG\"";;
esac
done
if [ "$EXCLUDE" = "" ]
then
find $SEARCH_PATH \
-regextype posix-egrep \
-type f \
-regex ".*(\.(php|html?|tpl|css|sql|js))$" \
-or -name ".htaccess" \
| xargs wc
else
# evil eval
eval 'find $SEARCH_PATH \
-regextype posix-egrep \
-type f \
\( -regex ".*(\.(php|html|tpl|css|sql|js))$" \
-or -name ".htaccess" \)\
'$EXCLUDE' \
| xargs wc'
fi
How I use this script
I use it in the following way to get an idea on just how much code I wrote for 3co:
~/projects/3co$ ./wc.sh
-e "*/tinymce/*" \
-e "*/classes/mail/*" \
-e "*/classes/markdown*" \
-e "*/classes/smartypants*" \
-e "*/classes/agent*" \
-e "*/3co/compress*"
17789 56608 630733 total
Woha - 17789 lines, 56608 words with a total of 630733 bytes.
Note
An additional feature of wc
is that it displays a warning if it encounters encoding problems. I really like that because that way I can make out possible encoding bugs quite simple.
Comments
Want to comment? Send me an email!
Comment by Michele Antolini (not verified) (2011-09-06 02:06:00)
Under OSX:
or wc -l
find * -name "*.java" -or -name "*.html"
i.e. just put double quotes ( ” ) around filter expression after -name
Comment by Gastón Fournier (not verified) (2009-03-25 12:40:00)
Leaving aside the fact that this line doesn’t encounter encoding problems, it did the job of counting the lines of code:
You can filter the files changing the find command, for example:
Comment by Milian Wolff (2009-03-25 22:38:00)
Pretty much the same I do. My script just has some additional features (esp. excluding stuff).