Download script for springerlink.com Ebooks
After a long period of silence I present you the following bash script for downloading books from http://springerlink.com. This is not a way to circumvent their login mechanisms, you will need proper rights to download books. But many students in Germany get free access to those ebooks via their universities. I for example study at the FU Berlin and put the script in my Zedat home folder and start the download process via SSH from home. Afterwards I download the tarball to my home system.
Read on for the script.
Download the script (attached below), push it to your Zedat account, make it executable and run it. You’ll have to give it a link to a book-detail page like this one for example. Also take a look at the example call at the top of the script.
Requires bash, wget, iconv, egrep.
Note: Take a look at the comments, Faro has come up with an updated Bash script which properly handles ebooks which span multiple pages on SpringerLink and merges the pdf-files with pdftk. Thanks Faro!
Note: For those, who’d prefer a Python version over a Bash-version, take a look at my second attempt on a download script. The Bash version is abandoned. Long live the Python version!
#!/bin/bash
if [[ "$1" == "" ]]; then
echo "Usage: $0 \"http://springerlink.com/content/.../?p=...\""
exit 1
fi
target=$1
# get whole page
echo -n "Please wait, link source is being downloaded..."
page=$(wget -q -O - "$target")
echo "ok - done"
echo -n "Validating link source..."
# get title of page
title_line=$(echo "$page" 2>/dev/null | grep -n -m 1 '<h2 class="MPReader_Profiles_SpringerLink_Content_PrimitiveHeadingControlName">' | egrep -o "^[[:digit:]]+")
if [[ "$title_line" == "" ]]; then
echo "invalid URL"
exit 1
fi
l=0
title=""
while read line; do
if [[ "$l" == "$title_line" ]]; then
title=$(echo "$line" | egrep -o "[[:alnum:]].+[[:alnum:]]" | iconv -f "UTF-8" -t "ASCII//TRANSLIT")
break
fi;
l=$(expr $l + 1)
done < <(echo "$page")
if [[ "$title" == "" ]]; then
echo "invalid URL"
exit 1
fi
echo "ok - done"
# check type
type=$(echo "$page" | grep -o '<span id="ctl00_PageHeadingLabel".*</span>' | grep -o '>.*<' | egrep -o '[^<>]+')
if [[ "$type" == "Book Chapter" ]]; then
echo "will download book chapter '$title'"
echo
wget -O "$title.pdf" "$(dirname $target)/fulltext.pdf"
elif [[ $type == "Book" ]]; then
echo "will download book '$title'"
echo
mkdir "$title" 2>/dev/null
cd "$title" || exit 1
# get links
declare -a links;
key=0
while read link; do
links[${key}]=$link
key=$(expr $key + 1)
done < <(echo "$page" | grep '/fulltext.pdf"><img' | egrep -o 'href="[^"]+' | cut -c 7-)
# get front + back matter
wget -O "0-front-matter.pdf" "$(dirname $target)/front-matter.pdf"
wget -O "$((${#links[@]}+1))-back-matter.pdf" "$(dirname $target)/back-matter.pdf"
# get chapters
key=0
while read chapter; do
echo "$(($key+1)) - $chapter :: ${links[${key}]}"
chapter=$(echo $chapter | iconv -f "UTF-8" -t "ASCII//TRANSLIT")
wget -O "$(($key+1))-$chapter"".pdf" "http://springerlink.com/${links[${key}]}"
key=$(expr $key + 1)
done < <(echo "$page" | egrep -o '^[[:blank:]]*<a href="/content/[^>]+&pi=[[:digit:]]+">[^>]+</a>' | \
egrep -o '>[^<]+' | cut -c 2-)
cd ..
tar -cvjf "$title.tar.bz2" "$title"
rm "$title"/*.pdf
rmdir "$title"
else
echo "unknown link type '$type'"
fi
Update 01/09/09: - The script now includes chapter numbers in the file names - The script can now handle links to single book chapters - minor other cleanup
Update 02/20/09: - fixed types
Update 02/24/09: - rewrite script in Python
Attachment | Size |
---|---|
springer_download.sh | 2.33 KB |
Comments
Want to comment? Send me an email!
Comment by Gabriel (not verified) (2012-12-13 16:16:00)
I tried the latest version from Github, but it merges the chapters in the wrong order! I have used the following commands:
./springer_download.py -c 978-3-642-23253-4
and
./springer_download.py -c 978-3-642-02507-5
I am using Debian Testing (Wheezy) with the latest updates (pdftk, not stapler). What is the mistake?
Comment by matze (not verified) (2012-04-07 20:08:00)
Can you rebuild this script in a way to work with http://www.oldenbourg-link.com/ An example is there: http://www.oldenbourg-link.com/isbn/9783486582451
I think both pages are similar and this shouldn’t be a lot of work, or?
Please help!
Comment by Milian Wolff (2012-04-08 18:04:00)
Sorry but that’s not going to work. Since I have no use for that page, why should I spend time on that?
Comment by macdet (not verified) (2011-12-08 18:52:00)
Good stuff - but I prefer the python method!
thx4all
Comment by Faro (not verified) (2009-02-25 00:47:00)
Hi Milian,
i fixed the script in bash, for those who likes the simplicity of bash. For myself, i love to see your python approach and will continue using that one. Thanks Faro
Comment by Milian Wolff (2009-02-25 02:37:00)
Great, thanks Faro!
I’ve added a note to the main article and took the liberty to enable syntax-highlighting for your code.
Comment by Faro (not verified) (2009-02-20 17:42:00)
just a remark. You could search for the counted chapter list: e.g.
and proceed the script in a loop for each page with
<a href="/content/t64382/?sortorder=asc&p_o=10">Next</a>
until the<span class="paginationDisabled">Next</span>
“Disabled” Tag appears… hope this helps
Comment by Milian Wolff (2009-02-23 16:08:00)
Yes, I know that and I will fix it one day. But maybe I’ll rewrite it in another language first, lets see!
Comment by Milian Wolff (2009-02-24 23:00:00)
rewrite has started and is usable imo, take a look at http://milianw.de/code-snippets/take-2-download-script-for-springerlinkc…
Comment by Faro (not verified) (2009-02-20 17:10:00)
Thank you for your fix. It now works… but i’ve discovered some problems with books spread on more than one page: like this one:
http://springerlink.com/content/t64382/?p=a10f1da5c8604081a487cfce67924074
do you have any idea for this to solve?
By the way if included
pdftk
echo $( ls |sort -n)cat output ../"$title".pdf
to merge the PDF files into one file. The ordering is done by sort -nComment by Milian Wolff (2009-02-23 16:07:00)
Great idea, I’ll add that since that will make “Go to page” work once again.
Comment by Faro (not verified) (2009-02-19 23:03:00)
Hi…
did springer change something or does my script do something wrong?
i always get:
Please wait, link source is being downloaded…ok - done Validating link source…ok - done unknown link type ”
Thank you for your help
Comment by Milian Wolff (2009-02-20 01:43:00)
Thanks for the hint, I fixed it. You can find an updated version above.
Comment by Andreas (not verified) (2008-12-15 18:20:00)
Hi, thank you very much for your helpful script! However I did found a small bug: once a book contains several chapters with the same name, only the first chapter is downloaded and the others are being omitted. This is an example: http://springerlink.com/content/lp46u2/?p=4315112f571546c79595e6d1dd7552…
I tried to add numbers into the “chapter”-line, but this only gave me the numbers in the filenames. The script insisted on downloading only the first chapter.
Cheers (and thanks again), Andreas
Comment by Milian Wolff (2009-01-09 16:53:00)
Ok, the Script was updated and should handle chapter numbers now correctly. Also it handles single book-chapter downloading well.
Comment by Andreas (not verified) (2009-02-27 17:34:00)
Thanks! With the new version I’ll be able to get the rest of the interesting books on math&physics.
Cheers, Andreas
Comment by Milian Wolff (2008-12-20 14:49:00)
I’ll look into it and update the script. Thanks for the report!