› Download script for springerlink.com Ebooks 
Sat, 11/08/2008 - 17:28
After a long period of silence I present you the following bash script for downloading books from http://springerlink.com. This is not a way to circumvent their login mechanisms, you will need proper rights to download books. But many students in Germany get free access to those ebooks via their universities. I for example study at the FU Berlin and put the script in my Zedat home folder and start the download process via SSH from home. Afterwards I download the tarball to my home system.
Read on for the script.
Download the script (attached below), push it to your Zedat account, make it executable and run it. You’ll have to give it a link to a book-detail page like this one for example. Also take a look at the example call at the top of the script.
Requires bash, wget, iconv, egrep.
Note: Take a look at the comments, Faro has come up with an updated Bash script which properly handles ebooks which span multiple pages on SpringerLink and merges the pdf-files with pdftk. Thanks Faro!
Note: For those, who’d prefer a Python version over a Bash-version, take a look at my second attempt on a download script. The Bash version is abandoned. Long live the Python version!
#!/bin/bash if [[ "$1" == "" ]]; then echo "Usage: $0 \"http://springerlink.com/content/.../?p=...\"" exit 1 fi target=$1 # get whole page echo -n "Please wait, link source is being downloaded..." page=$(wget -q -O - "$target") echo "ok - done" echo -n "Validating link source..." # get title of page title_line=$(echo "$page" 2>/dev/null | grep -n -m 1 '<h2 class="MPReader_Profiles_SpringerLink_Content_PrimitiveHeadingControlName">' | egrep -o "^[[:digit:]]+") if [[ "$title_line" == "" ]]; then echo "invalid URL" exit 1 fi l=0 title="" while read line; do if [[ "$l" == "$title_line" ]]; then title=$(echo "$line" | egrep -o "[[:alnum:]].+[[:alnum:]]" | iconv -f "UTF-8" -t "ASCII//TRANSLIT") break fi; l=$(expr $l + 1) done < <(echo "$page") if [[ "$title" == "" ]]; then echo "invalid URL" exit 1 fi echo "ok - done" # check type type=$(echo "$page" | grep -o '<span id="ctl00_PageHeadingLabel".*</span>' | grep -o '>.*<' | egrep -o '[^<>]+') if [[ "$type" == "Book Chapter" ]]; then echo "will download book chapter '$title'" echo wget -O "$title.pdf" "$(dirname $target)/fulltext.pdf" elif [[ $type == "Book" ]]; then echo "will download book '$title'" echo mkdir "$title" 2>/dev/null cd "$title" || exit 1 # get links declare -a links; key=0 while read link; do links[${key}]=$link key=$(expr $key + 1) done < <(echo "$page" | grep '/fulltext.pdf"><img' | egrep -o 'href="[^"]+' | cut -c 7-) # get front + back matter wget -O "0-front-matter.pdf" "$(dirname $target)/front-matter.pdf" wget -O "$((${#links[@]}+1))-back-matter.pdf" "$(dirname $target)/back-matter.pdf" # get chapters key=0 while read chapter; do echo "$(($key+1)) - $chapter :: ${links[${key}]}" chapter=$(echo $chapter | iconv -f "UTF-8" -t "ASCII//TRANSLIT") wget -O "$(($key+1))-$chapter"".pdf" "http://springerlink.com/${links[${key}]}" key=$(expr $key + 1) done < <(echo "$page" | egrep -o '^[[:blank:]]*<a href="/content/[^>]+&pi=[[:digit:]]+">[^>]+</a>' | \ egrep -o '>[^<]+' | cut -c 2-) cd .. tar -cvjf "$title.tar.bz2" "$title" rm "$title"/*.pdf rmdir "$title" else echo "unknown link type '$type'" fi
Update 01/09/09: - The script now includes chapter numbers in the file names - The script can now handle links to single book chapters - minor other cleanup
Update 02/20/09: - fixed types
Update 02/24/09: - rewrite script in Python
| Attachment | Size |
|---|---|
| springer_download.sh | 2.33 KB |
Comments
I tried the latest version Thu, 12/13/2012 - 16:16 — Gabriel (not verified)
I tried the latest version from Github, but it merges the chapters in the wrong order! I have used the following commands:
./springer_download.py -c 978-3-642-23253-4
and
./springer_download.py -c 978-3-642-02507-5
I am using Debian Testing (Wheezy) with the latest updates (pdftk, not stapler). What is the mistake?
Can you rebuild this script Sat, 04/07/2012 - 20:08 — matze (not verified)
Can you rebuild this script in a way to work with http://www.oldenbourg-link.com/ An example is there: http://www.oldenbourg-link.com/isbn/9783486582451
I think both pages are similar and this shouldn’t be a lot of work, or?
Please help!
Sorry but that’s not going to Sun, 04/08/2012 - 18:04 — Milian Wolff
Sorry but that’s not going to work. Since I have no use for that page, why should I spend time on that?
Good stuff - but I prefer the Thu, 12/08/2011 - 18:52 — macdet (not verified)
Good stuff - but I prefer the python method!
thx4all
Hi Milian, i fixed the Wed, 02/25/2009 - 00:47 — Faro (not verified)
Hi Milian,
i fixed the script in bash, for those who likes the simplicity of bash. For myself, i love to see your python approach and will continue using that one. Thanks Faro
Great, thanks Faro! I’ve Wed, 02/25/2009 - 02:37 — Milian Wolff
Great, thanks Faro!
I’ve added a note to the main article and took the liberty to enable syntax-highlighting for your code.
just a remark. You could Fri, 02/20/2009 - 17:42 — Faro (not verified)
just a remark. You could search for the counted chapter list: e.g.
and proceed the script in a loop for each page with
<a href="/content/t64382/?sortorder=asc&p_o=10">Next</a>until the<span class="paginationDisabled">Next</span>“Disabled” Tag appears… hope this helps
Yes, I know that and I will Mon, 02/23/2009 - 16:08 — Milian Wolff
Yes, I know that and I will fix it one day. But maybe I’ll rewrite it in another language first, lets see!
rewrite has started and is Tue, 02/24/2009 - 23:00 — Milian Wolff
rewrite has started and is usable imo, take a look at http://milianw.de/code-snippets/take-2-download-script-for-springerlinkc…
Thank you for your fix. It Fri, 02/20/2009 - 17:10 — Faro (not verified)
Thank you for your fix. It now works… but i’ve discovered some problems with books spread on more than one page: like this one:
http://springerlink.com/content/t64382/?p=a10f1da5c8604081a487cfce67924074do you have any idea for this to solve?
By the way if included
pdftk `echo $( ls |sort -n)` cat output ../"$title".pdfto merge the PDF files into one file. The ordering is done by sort -nGreat idea, I’ll add that Mon, 02/23/2009 - 16:07 — Milian Wolff
Great idea, I’ll add that since that will make “Go to page” work once again.
Hi… did springer change Thu, 02/19/2009 - 23:03 — Faro (not verified)
Hi…
did springer change something or does my script do something wrong?
i always get:
Please wait, link source is being downloaded…ok - done Validating link source…ok - done unknown link type ”
Thank you for your help
Thanks for the hint, I fixed Fri, 02/20/2009 - 01:43 — Milian Wolff
Thanks for the hint, I fixed it. You can find an updated version above.
Hi, thank you very much for Mon, 12/15/2008 - 18:20 — Andreas (not verified)
Hi, thank you very much for your helpful script! However I did found a small bug: once a book contains several chapters with the same name, only the first chapter is downloaded and the others are being omitted. This is an example: http://springerlink.com/content/lp46u2/?p=4315112f571546c79595e6d1dd7552…
I tried to add numbers into the “chapter”-line, but this only gave me the numbers in the filenames. The script insisted on downloading only the first chapter.
Cheers (and thanks again), Andreas
Ok, the Script was updated Fri, 01/09/2009 - 16:53 — Milian Wolff
Ok, the Script was updated and should handle chapter numbers now correctly. Also it handles single book-chapter downloading well.
Thanks! With the new version Fri, 02/27/2009 - 17:34 — Andreas (not verified)
Thanks! With the new version I’ll be able to get the rest of the interesting books on math&physics.
Cheers, Andreas
I’ll look into it and update Sat, 12/20/2008 - 14:49 — Milian Wolff
I’ll look into it and update the script. Thanks for the report!
Post new comment