Using curl to Automate Multipage Logins

Watch this week's video on YouTube

I like solving the daily New York Times crossword on paper. However, logging in to download the PDF every day and printing it is a pain.

In this post, I will share how I automated the whole process with curl and some bash scripting.

Downloading a pdf file with curl

curl is a command line tool for making HTTP requests (and many other data transfer protocols).

Using it to download a file like the New York Times daily crossword puzzle is as easy as:

curl "https://www.nytimes.com/svc/crosswords/v2/puzzle/print/19803.pdf" -o crossword.pdf

This works great for unauthenticated websites but poses a problem here: the New York Times crossword is a paid subscription. For the above URL to work, the HTTP request needs to be part of a session that has first been authenticated by the New York Times server.

Three HTTP requests to login

The New York Times Crossword login process looks like this:

NYTimes Crossword login page

There are a total of three HTTP requests that need to happen:

  1. Loading the initial login page (left screenshot above)
  2. Clicking "Continue" after typing in your email address
  3. Pressing "Log In" after typing in your password

It's important to be aware that there are three requests because each request requires additional data to be sent along with it beyond the expected email address/password. Using your browser's developer tools is an easy way to identify these separate requests.

Multipage logins with curl

The first request

The first request (that loads the login page) is important because it contains two pieces of data we will need to submit with subsequent requests: 1. Some cookies that need to be carried through all login requests 2. A Cross Site Request Forgery (CSRF) token

Saving and passing along the cookies for each request is easy: the -c and -b arguments in curl to save and pass cookies to/from a local text file:

curl -c cookies.txt -b cookies.txt "https://myaccount.nytimes.com/auth/enter-email

The CSRF token is a little more work. Once the above page downloads the HTML code, we can parse the CSRF token into a variable with our bash script:

# Parse out the CSRF auth token
AUTH_TOKEN=$(curl -c cookies.txt -b cookies.txt "https://myaccount.nytimes.com/auth/enter-email?response_type=cookie&client_id=lgcl&redirect_uri=https%3A%2F%2Fwww.nytimes.com" 2>&1 | grep -oP '(?<=authToken&quot;:&quot;).*?(?=&quot;)')

# Replace HTML encoded entities
AUTH_TOKEN=${AUTH_TOKEN//&#x3D;/=}

The second request

There are two more requests: the request that sends the email, then the request that sends the email and password together. These appear to be on the same web page but looking at the network traffic shows they are two separate requests.

Like before, we persist and pass the cookies for each request with the -c and -b arguments. We also pass some parameters in a JSON object after the -d flag. Finally, to mimic the browser/webpage making the request, we pass long required headers with the -H arguments:

# First page that asks for an email address
curl -c cookies.txt -b cookies.txt -X POST -d '{"email":"'$USERNAME'","auth_token":"'$AUTH_TOKEN'","form_view":"enterEmail"}' "https://myaccount.nytimes.com/svc/lire_ui/authorize-email" -H "Content-Type: application/json"
# Second page that asks for a password
curl -c cookies.txt -b cookies.txt -X POST -d '{"username":"'$USERNAME'","auth_token":"'$AUTH_TOKEN'","form_view":"login","password":"'$PASSWORD'","remember_me":"Y"}' "https://myaccount.nytimes.com/svc/lire_ui/login" -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0' -H 'Accept: application/json' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: https://myaccount.nytimes.com/auth/enter-email?response_type=cookie&client_id=lgcl&redirect_uri=https%3A%2F%2Fwww.nytimes.com' -H 'Content-Type: application/json' -H 'Req-Details: [[it:lui]]' -H 'Origin: https://myaccount.nytimes.com' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: same-origin' -H 'TE: trailers'

Downloading the crossword puzzle

Once we finish those three requests, we should have a cookie saved to our cookies.txt file that indicates we are authorized and logged in. If all that went well, we can now run our first curl request again and the PDF puzzle download should work:

Once I have the cookie that shows I'm authenticated, I download the pdf:

# Download the print edition of the crossword
curl -b cookies.txt -s "https://www.nytimes.com/svc/crosswords/v2/puzzle/print/19803.pdf" -o crossword.pdf

There are a few more basic requests involved to variabalize the puzzle date (19803 above). If interested, you can find these additional steps in my NYTimes Crossword Download and Print script on GitHub.

Daily scheduling and printing automation.

With the PDF crossword puzzle downloaded, all I need to do is have the file automatically sent to my printer every morning.

I'm running this script on a Raspberry Pi server running Linux, so all I need to do is issue an lp command to send the file to my printer:

lp -n $NUMBER_OF_COPIES -o fit-to-page -d BrotherHL2170W crossword.pdf

That's it! I've scheduled the script with cron and now every morning at 7 am, I have two copies of that day's crossword puzzle sitting in my printer, ready to be filled with no manual intervention required.

If you want to do something similar, the full script is available on my GitHub New York Times Crossword Daily Download and Print repository.

Joker's Wild

This past weekend I had a blast presenting Joker's Wild with Erin Stellato (blog|twitter), Andy Mallon (blog|twitter), and Drew Furgiuele (blog|twitter).

Watch this week's video on YouTube

Table of contents:

  • What is Joker's Wild? Watch this to witness Andy's amazing PowerPoint animation skills (0:00)
  • Bert demos SQL injection (2:25)
  • Erin recollects desserts (9:55)
  • Andy shares an automation tip (18:55)
  • Andy explains an ANSI standard (23:10)
  • Drew describes containers (27:02)

While a video doesn't quite give you the same experience as being in the room with dozens of other data professionals laughing and shouting along, hopefully it gives you an idea.

Here's a behind-the-scenes peek at how it all came together.

A Different Kind Of Presentation

I've wanted to do a "fun" SQL Server presentation for a while; something that would be lighthearted while still delivering (some) educational value.

I ran some ideas past Erin after SQL Saturday Cleveland earlier this year. We came up with several concepts ideas we could incorporate into the presentation (thanks to Paul Popovich and Luis Gonzalez for also helping us generate a lot of these ideas) and at that point I think Erin came up with the name "Joker's Wild."

Blind Commitment

Fast forward a few months: occasionally I'd talk about the presentation idea with people but still wasn't any closer to actually making it real.

Then a few days before the SQL Saturday Columbus submission deadline, Erin reached out to ask if we were going to submit. We recruited Andy and Drew to help present and submitted an abstract:

Come one, come all to the greatest (and only) SQL Server variety show at SQL Saturday Columbus.

This session features a smattering of lightning talks covering a range of DBA- and developer-focused SQL Server topics, interspersed with interactive games to keep the speakers and audience on their toes.

Plan for plenty of sarcasm, laughs, and eye rolls in this thoughtfully structured yet highly improvised session.

We can't guarantee what you'll learn, but we do promise a great time!

*Slot machine will not generate real money for "winners"

Structure

If that abstract reads a little vague, it's because at that point we didn't know exactly what we wanted to do yet. Once our session was selected though it was time to come up with a concrete plan (big thank you to David Maxwell and Peter Shore for giving us the opportunity to try something like this).

After some discussion, Erin, Andy, Drew, and I came up with the following structure:

  1. The audience will choose the lightning talk topic
  2. We will spin the "Wheel of Misfortune" to determine the presentation style, including:
    • Slides I didn't write
    • Random slide timing
    • Who has the clicker?
  3. We will play some SQL Server themed Jeopardy and Pictionary with the audience

After our first meeting Andy created the world's most versatile PowerPoint presentation that would run the show. Seriously, if you haven't watched the video above yet, go watch it - that introduction is all PowerPoint goodness created by him.

The Session and Final Thoughts

I'm incredibly happy with how it all went. The session was planned but a lot of it was still left up to a highly improvised performance. I had a lot of fun preparing and presenting, and I think the session was well received by the audience. Jeopardy and Pictionary were a lot of fun too, even though I ran out of video recording space so I couldn't include them in the video.

I hope we have another opportunity to present this session again in the future.

Thank you again David and Peter for letting us do this session as part of SQL Saturday Columbus.

Thank you to our audience for taking a risk on attending a session you didn't know much about. Also for your great participation.

And thank you Erin, Andy, and Drew for helping do something fun and different.

T-SQL Tuesday #104 Roundup

MJ-t-sql-TuesdayThis month's T-SQL Tuesday topic asked "What code would you hate to live without?" Turns out you like using script and code to automate boring, repetitive, and error-prone tasks.

Thank you to everyone who participated; I was nervous that July holidays and summer vacations would stunt turnout, however we wound up with 42 posts!

Watch tsqltuesday.com for next month's topic and consider signing up to host.

Watch this week's video on YouTube

Without further ado, here are this month's entries sorted in random order:

  • Stuart Moore shares the history behind needing to automate restore testing and writing the SqlAutoRestores PowerShell module to help.  Nowadays his commands are found in dbatools.  Great example of how a project can evolve through the community.
  • Arthur Daniels shares his script to identify the key and included columns of indexes in a given table.
  • Glenn Berry shares his DMV Diagnostic Queries and the story behind how he started developing them back in 2006.
  • Jason Brimhall links to multiple scripts he's shared in the past as well as a new script for remotely auditing server access to catch infilitraters red-handed.
  • Doug Purnell talks about how he uses database snapshots and shares some code for how he manages them.
  • Jay Robinson shares two C# extensions (shout to my fellow devs!): one to check an enum for a value and a second to cleanly handle the lengthy DBNull.Value syntax.
  • Drew Furgiuele shares how he scripts out his indexes to re-apply after snapshot replication.  He then writes very similar functionality using PowerShell in only 6 lines!
  • Tim Weigel shares which community scripts he uses regularly, as well as sharing his own scripts around replication, stored procedure execution information, and file manipulation.
  • Hugo Kornelis submitted two posts.  The first post shares sp_metasearch which helps with performing impact analysis and the second post follows up with an enhancement he's made to Ola Hallengren's database maintenance scripts to ignore backup BizTalk databases.
  • Andy Mallon shares his comprehensive script for checking database, file, data, log, etc... sizes.  Great explanations of his reasoning for writing the queries the way he did.
  • Dan Clemens shares his database search script with a switch that includes searching across agent jobs.
  • Jess Pomfret wrote a script that shows compression stats for database objects.  Wanting to run it against a whole instance (or across mulitple servers), she wrote a dbatools command to automate the process.
  • Kenneth Fisher shows us how he organizes his toolbox using an SSMS solution.
  • Rob Farley shares code he's written to demonstrate the pain of using NOLOCK.
  • Steve Jones shares a procedure from Microsoft that he uses for transferring logins and passwords between instances.
  • Kevin Hill shares two scripts he uses for finding low-hanging index optimization fruit: one that finds queries performing heap or clustered index scans, and another that returns the top 5 missing indexes per database.
  • Michael Villegas learned that Azure SQL doesn't allow you to graphically show user roles and permissions, so he wrote a script to query those details (works for on-premise SQL Server as well).
  • Nate Johnson shares scripts that identify if tables are being replicated, whether SSRS subscriptions executed, and how much space certain objects and files are consuming.
  • William Andrus shares how he uses his search script to find similarly named fields or all instances of a piece of text within a database.
  • Bert Wagner (me!) I share my template for generating dynamic table-driven code, making queries more adaptable to future changes.
  • Rudy Rodarte shows us a script he uses for iterating over a date range to use for executing date based queries.
  • Brent Ozar admits he can't live without sp_Blitz, but this month he shares a script for checking how much plan cache history exists on a server.
  • Jeff Mlakar offers a solution for taking all databases on an instance offline (and then back online) again.
  • Erik Darling offers a solution for constructing dynamic SQL so that his MAX variables don't get truncated.  He also links to a script for printing long strings in SSMS.
  • Chrissy LeMaire takes the hard work out of instance to instance migrations by sharing her single-line dbatools command that will do it all for you.  She also shares how dbachecks automates manual checklist work.
  • Glenda Gable mentions two procedures, one that is a high performance cursor rewrite and one  that is a robust log shipping solution.
  • Aaron Bertrand shows us how he discovers undocumented SQL Server features by comparing new builds to the previous versions.
  • Ryan Desmond writes about his post-install confirguration process and shares code he runs to customize Ola Hallengren's maintenance scripts for his environments.
  • Josh Simar shares his database file size code that is optimized for "very large databases" that span multiple files and filegroups.
  • Sander Stad discusses the importance of sharing code and offers a few dbatools commands that he's contributed to or authored around backup testing, log shipping, and SQL Server Agent manipulation.
  • Andy Levy wrote an SSMS snippet to generate a cursor.  Before you chew him out though, he has some really good uses cases for needing to use them.
  • Andy Yun reveals what's in his T-SQL toolbox and explains his organization strategies for 10+ years of scripts he's collected.
  • Eduardo Pivaral shares a script he uses to output query results into an HTML table, making it easy to copy into an email.
  • Raul Gonzalez shows us a versatile script for searching database tables and returning information on attributes such as column name, size, key definitions, and more.
  • Matthew McGiffen wanted to find the most expensive queries on an instance using Query Store instead of the traditional DMVs, so he wrote a script to do just that.
  • Daniel Hutmacher shares his beefed up version of sp_help.  Includes ASCII art dependency graphs and database search.
  • Christian Gräfe provides a function he wrote for padding the left-side of a value with zeros.
  • Adrian Buckman  shares his SQLUndercover Inspector HTML reporting tool, as well as scripts for helping to alter AG groups, checking for running jobs, and auditing failed logins.
  • Louis Davidson shares his technique for using relative positioning in date tables to make querying custom periods (eg. your company's fiscal month) easier.
  • Lance England shares a PowerShell script to automate generating upsert merge statements for his ETLs.

"How do I contribute to dbatools?" with Drew Furgiuele

Watch this week's video on YouTube

This weekend I caught up with Drew Furgiuele at SQL Saturday Cleveland and learned how to get involved with the open-source dbatools PowerShell module.

If you don't use dbatools yet, what are you waiting for?  It's an amazing community project that will help you automate your SQL Server work with over 350 ready-to-use commands.

After you start using dbatools, you might think of new commands you want to add or other features you want to improve - and the best part is that you can!

Even if writing PowerShell commands isn't one of your strengths, there are many ways you can contribute to this excellent community project as Drew mentions in the video above.