So I've found another cool website where users can read books online. Based on my previous article I still want to download books to my reader. So script was improved a bit.

This time task was more challenged. Website uses custom form with csrf tokens to render pages with ajax on the page, storing data in cookies and other cool stuff.

Getting cookie value is simple with curl:

    cat $1 |
    grep ".*$2" | cut -f7
curl -o "$PAGE_HTML" --silent --no-verbose --cookie-jar $COOKIES_FILE $URL
CSRF_COOKIE=$(get_cookie $COOKIES_FILE "_csrf")

This time I've added human-readable chapters names to FB2 file. Chapters located in select box. Using hxselect we get all values, then converting <option> to <a>, then using hxpipe we get all strings which started with dash symbol and using awk select all of them.
And to have an array variable I'm using readarray command.

    cat $1 |
    hxnormalize -x -e -s | 
    hxselect -i select.js-chapter-change | 
    sed -e 's/<select[^>]*>//g' | 
    sed 's/<option/<a/g' | 
    sed 's/option>/a>/g' | 
    sed 's/value=/href=/g' | 
    tr -d '\n' | hxpipe | awk -F "-" '{print $2}' | grep "\S"
readarray -t CHAPTERS_NAMES < <(get_chapters_names $PAGE_HTML)

One thing Ive noticed is that server blocked my script if it making requests too fast. So I've added token refresh after sleeping for 2min. In other case, no token refresh needed if I'm making a pause for 2 sec between requesting new pages.

One interesting stuff I've tried to implement is to get the data from json. I've tried a lot of stuff for bash commands only, but they worked only with small json key-value data. So the easieat and smallest solution I've found is to use PHP to get value by key from json:

    cat $1 | \
    php -r "echo json_decode(file_get_contents('php://stdin'))->$2 ?? '';"

Other stuff is the rewrited copy from previous article.

Source code:

Add new comment

The content of this field is kept private and will not be shown publicly.
  • No HTML tags allowed.