Data with Bert logo

Talkie - a simple, private, responsive interface for LLMs

Watch this video on YouTube

I like using ChatGPT. But I don't like the $20/month price tag for using OpenAI's app, especially given the API costs fractions of a cent.

So what did I do? I created my own ChatGPT-like app, of course!

Talkie, a ChatGPT Clone

talkie.dev app icon

Plenty of free apps exist for interfacing with OpenAI's model APIs, but after trying a dozen of them, I couldn't find one that met my needs. Specifically, I wanted something that would:

  • Generate text and images.
  • Have light and dark modes that respect my device settings.
  • Simple to use. No clutter from features I don't care about.
  • One app that works the same across all my devices.
  • No logins required.
  • Data stays in my browser only. The only data transmitted should go straight to the OpenAI APIs. I don't need a middleman logging all my chats for who knows what purpose.

It took me a few weeks to brush off the rust on my vanilla JavaScript skills and get things coded, but I was able to create exactly what I wanted:

Screenshot of the Talkie app generating text and images

Developing Talkie

If you don't care about how I developed the app and just want to use it, go straight to the Using Talkie section.

Building the app was straightforward. My goals were to add the features I wanted, keep the user interface simple so my family members and friends would enjoy using it, and keep the code easy to understand so I could easily update it in the future.

The main app is under 500 lines of uncompressed vanilla JavaScript code. You can check it out on GitHub. I tried to use ChatGPT to write some of this code, but I quickly discovered that GPT-4 (the model I used at time of development) was never trained on any OpenAI API documentation or examples. It makes sense in hindsight, but I found that ironically funny at the time.

I ended up using Pico css to help with the app's design. I hate writing CSS, so Pico was a nice way to build a minimalist, responsive website with little effort. The library is lightweight and doesn't include every feature under the sun (as opposed to something like Bootstrap), which was nice because it forced me to simplify the design. Pico also respects a device's light/dark mode out of the box, which is a nice added bonus.

I also used Showdown, a Markdown to HTML converter. OpenAI's API responses return Markdown-formatted text that I needed to convert to HTML. At first, I thought I could get away with doing my own minimal custom Markdown formatting but I quickly realized there are a lot of edge cases that a full library would handle better.

That's about it. The app probably has bugs and isn't perfect, but it meets my needs, and I hope others will find it useful too. Talkie's source code is MIT-licensed and available on GitHub.

Using Talkie

Talkie is available as a progressive web app, meaning it will work on your computer, phone, and everything in between. It's available at:

https://talkie.dev

To start using it, you will need an OpenAI API key. To get an OpenAI key, regsiter for an OpenAI account or sign in to your OpenAI account, then generate a new secret key from the API key page. Paste that key into Talkie, and you'll be good to go (remember: Talkie only saves your API key to your browser's local storage - it never leaves your device!).

That's it. Type in some prompts, generate images and have fun with the same simple, privacy-focused experience on all your devices.

Crosstream - Efficient Cross-Server Joins on Slow Networks in Python

Earlier this year I presented at PyCon 2023 in Salt Lake City, UT on the topic of "Efficient Cross-Server Data Joins on Slow Networks with Python". You can watch the presentation on YouTube:

Watch this presentation on PyCon's YouTube channel.

Slides from the session can be found https://github.com/bertwagner/Presentations/tree/master/crosstream.

The crosstream Python package mentioned can be found at: https://github.com/bertwagner/crosstream.

jurn - A Command Line Tool for Keeping Track of Your Work

Watch this week's video on YouTube

How do you keep track of your daily work accomplishments?

If you are like me, you wait until the end of the week and then dread having to think back all the way to Monday to try and remember what you did that week. By that point all I usually remember is what I ate for breakfast that morning and the immediate problem I was working on before doing my weekly writeup.

I wanted to get in the habit of better documenting my work accomplishments, so I built jurn, a command line tool to help tag and log work as you do it. It has made tracking my work easier, making it simple to share progress in stand-up meetings and invaluable for end of the year performance evaluations.

In this post I will show you its most important features and how I use it.

Install

jurn is a command line tool I wrote in Python. To install, you just need to run pip install jurn. You can also modify and build from the source code hosted on GitHub.

jurn stores its data in a local sqlite database, meaning no one else ever sees your data and you can query the data yourself if you have other needs for it that jurn doesn't currently support.

Logging

Whenever I want to log an accomplishment throughout the day, I simply use the jurn log command to save it:

jurn log -m 'Wrangled cats via lasso.'

If I want to add some organizational structure to my notes, I can also include tags:

jurn log -m 'Wrangled cats via lasso.' -t 'Physical Fitness'

jurn also comes built in with tab complete functionality for tags, so if you type in:

jurn log -m 'Wrangled cats via lasso.' -t 'Physical <TAB>

jurn will get a distinct list of your previously used tags from the database, helping you pick the right one to autocomplete with:

Physical Fitness
Physical Security

The tagging system also allows for subcategories denoted by the # sign, allowing for even more organization:

jurn log -m 'Wrangled cats via lasso.' -t 'Physical Fitness#Cardio'

NOTE: To enable autocompletion, you must add this to your ~/.bashrc. eval "$(_JURN_COMPLETE=bash_source jurn)" If using other shells, please reference the Click documentation for the specific line you need to add.

Viewing Entries

Once it's time to reflect on what you did for a particular day (or week, month, year, etc..), you use the jurn print command to get a pretty printed version of your log entries:

jurn print -d week
2022-11-01 to 2022-11-07
   Physical Fitness
     Cardio
       Wrangled cats via lasso.
     Strength
       New PR benching 2 mules.
   Physical Security
     Installed new locks on the doors.
     Added storage to camera recording devices.

Now you can easily share with your self/boss/team what you worked on. Come performance evaluation season, you'll also have a nice reminder of what you did all year that justifies your raise or promotion.

Automation

While running the jurn command whenever I finish a note-worthy task is quick and easy, I don't trust myself to always remember to do it.

Instead, I add a line like:

jurn --early-end 60 log -t

to my ~/.bashrc in order to have jurn remind me to log any accomplishments every time I open a new terminal tab/window (which I do all day long).

To prevent it from being too annoying, I pass in the ---early-end option so it knows not to prompt me for my updates if I have already written one in the past 60 minutes.

...And More

I built jurn to fulfill the needs I had with keeping track of my work accomplishments. All of the tool's various options can be found by running jurn --help or viewing jurn's documentation.

If you've read this far, I assume you value keeping track of your daily work progress and I hope you can use jurn to make that process a little bit easier.

Using curl to Automate Multipage Logins

Watch this week's video on YouTube

I like solving the daily New York Times crossword on paper. However, logging in to download the PDF every day and printing it is a pain.

In this post, I will share how I automated the whole process with curl and some bash scripting.

Downloading a pdf file with curl

curl is a command line tool for making HTTP requests (and many other data transfer protocols).

Using it to download a file like the New York Times daily crossword puzzle is as easy as:

curl "https://www.nytimes.com/svc/crosswords/v2/puzzle/print/19803.pdf" -o crossword.pdf

This works great for unauthenticated websites but poses a problem here: the New York Times crossword is a paid subscription. For the above URL to work, the HTTP request needs to be part of a session that has first been authenticated by the New York Times server.

Three HTTP requests to login

The New York Times Crossword login process looks like this:

NYTimes Crossword login page

There are a total of three HTTP requests that need to happen:

  1. Loading the initial login page (left screenshot above)
  2. Clicking "Continue" after typing in your email address
  3. Pressing "Log In" after typing in your password

It's important to be aware that there are three requests because each request requires additional data to be sent along with it beyond the expected email address/password. Using your browser's developer tools is an easy way to identify these separate requests.

Multipage logins with curl

The first request

The first request (that loads the login page) is important because it contains two pieces of data we will need to submit with subsequent requests: 1. Some cookies that need to be carried through all login requests 2. A Cross Site Request Forgery (CSRF) token

Saving and passing along the cookies for each request is easy: the -c and -b arguments in curl to save and pass cookies to/from a local text file:

curl -c cookies.txt -b cookies.txt "https://myaccount.nytimes.com/auth/enter-email

The CSRF token is a little more work. Once the above page downloads the HTML code, we can parse the CSRF token into a variable with our bash script:

# Parse out the CSRF auth token
AUTH_TOKEN=$(curl -c cookies.txt -b cookies.txt "https://myaccount.nytimes.com/auth/enter-email?response_type=cookie&client_id=lgcl&redirect_uri=https%3A%2F%2Fwww.nytimes.com" 2>&1 | grep -oP '(?<=authToken&quot;:&quot;).*?(?=&quot;)')

# Replace HTML encoded entities
AUTH_TOKEN=${AUTH_TOKEN//&#x3D;/=}

The second request

There are two more requests: the request that sends the email, then the request that sends the email and password together. These appear to be on the same web page but looking at the network traffic shows they are two separate requests.

Like before, we persist and pass the cookies for each request with the -c and -b arguments. We also pass some parameters in a JSON object after the -d flag. Finally, to mimic the browser/webpage making the request, we pass long required headers with the -H arguments:

# First page that asks for an email address
curl -c cookies.txt -b cookies.txt -X POST -d '{"email":"'$USERNAME'","auth_token":"'$AUTH_TOKEN'","form_view":"enterEmail"}' "https://myaccount.nytimes.com/svc/lire_ui/authorize-email" -H "Content-Type: application/json"
# Second page that asks for a password
curl -c cookies.txt -b cookies.txt -X POST -d '{"username":"'$USERNAME'","auth_token":"'$AUTH_TOKEN'","form_view":"login","password":"'$PASSWORD'","remember_me":"Y"}' "https://myaccount.nytimes.com/svc/lire_ui/login" -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0' -H 'Accept: application/json' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: https://myaccount.nytimes.com/auth/enter-email?response_type=cookie&client_id=lgcl&redirect_uri=https%3A%2F%2Fwww.nytimes.com' -H 'Content-Type: application/json' -H 'Req-Details: [[it:lui]]' -H 'Origin: https://myaccount.nytimes.com' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: same-origin' -H 'TE: trailers'

Downloading the crossword puzzle

Once we finish those three requests, we should have a cookie saved to our cookies.txt file that indicates we are authorized and logged in. If all that went well, we can now run our first curl request again and the PDF puzzle download should work:

Once I have the cookie that shows I'm authenticated, I download the pdf:

# Download the print edition of the crossword
curl -b cookies.txt -s "https://www.nytimes.com/svc/crosswords/v2/puzzle/print/19803.pdf" -o crossword.pdf

There are a few more basic requests involved to variabalize the puzzle date (19803 above). If interested, you can find these additional steps in my NYTimes Crossword Download and Print script on GitHub.

Daily scheduling and printing automation.

With the PDF crossword puzzle downloaded, all I need to do is have the file automatically sent to my printer every morning.

I'm running this script on a Raspberry Pi server running Linux, so all I need to do is issue an lp command to send the file to my printer:

lp -n $NUMBER_OF_COPIES -o fit-to-page -d BrotherHL2170W crossword.pdf

That's it! I've scheduled the script with cron and now every morning at 7 am, I have two copies of that day's crossword puzzle sitting in my printer, ready to be filled with no manual intervention required.

If you want to do something similar, the full script is available on my GitHub New York Times Crossword Daily Download and Print repository.

Recursively Querying Row Groups

Watch this week's video on YouTube

Recursive queries are fun to plan and write. They can be frustrating too depending on the complexity of the problem you are trying to solve.

This post shows one solution for finding all records that are related, either directly or via intermediate records, using recursive queries in SQL Server.

The Data

Here's the data we'll be working with:

DROP TABLE  IF EXISTS #relationships;

CREATE TABLE #relationships (
    Id int,
    FK1 varchar(10),
    FK2 varchar(10)
);

INSERT INTO #relationships
VALUES 
    /* This group of records are all related, directly or through intermediate nodes */
    (1,'A','B'),
    (2,'A','E'),
    (3,'A','G'),
    (4,'A','F'),
    (5,'B','F'),
    (6,'B','E'),
    (7,'E','F'),
    (8,'G','H'),

    (9,'B','A'),  /* This is  an identical relationship as Id=1 */
    (10,'B','F'), /* This is a straight up duplicate record to Id=5*/


    /* These records are related simply where FK2 of one row is FK1 of the subsequent row */
    (11,'I','J'),
    (12,'J','K'),
    (13,'K','L'),

    /* Some more related records */
    (14,'M','N'),
    (15,'O','M'),
    (16,'P','O'),
    (17,'M','N'), /* Duplicate of Id=14 */
    (18,'M','O'), /* Flipped duplicate of 15 */
    (19,'M','P'),

    /* These records are interesting because the FK2 values never appear in the FK1 column */
    (20,'Q','R'),
    (21,'S','R'),
    (22,'T','R');

Each row has values for Id,FK1,and FK2 columns. Id is a unique primary key that I included to make referencing individual rows easier. FK1 and FK2 are foreign keys that reference full records in another table. The point is that this table shows the relationships between records.

I used blank lines to visually indicate a group of records in the data above. The end result I want to achieve is to have each group of related records to share the same group_id. I also added some comments to point out some interesting scenarios in the data.

I programmed in some of the data edge cases I knew I would encounter into the test data; a sort of poor-man's test driven development. Including these known edge cases helps me test my query so I know my final solution will handle them correctly. If you are applying this solution to your own data, make sure to add your own test cases into the data.

Query Transformations for Sorted Hashes

The first recursive query I wrote to solve this problem was ugly. I was creating sorted hashes to ensure that rows where the FK values were swapped were deduplicated (eg. I would want to dedupe where Id=1 and Id=9). This solution involved joining on CASE statements like this all throughout the set of queries: CASE WHEN FK1<=FK2 THEN '|'+FK1+','+FK2+'|' ELSE '|'+FK2+','+FK1+'|' END AS key_hash.

Yuck. The solution worked, but I know my future self would not want to maintain that type of code.

Rather than have the final query do all of the work, I decided to clean up the data first instead.

Initial Clean Up

I decided to transform the data to eliminate all duplicates and make sure my keys were always sorted so the value of FK1 < FK2. This allows for a simpler final query:

--Remove duplicates where keys are switched
DROP TABLE  IF EXISTS #deduped;
CREATE TABLE #deduped (
Id int IDENTITY(1,1),
FK1 varchar(10),
FK2 varchar(10),
key_hash varchar(100)
);
INSERT INTO #deduped (FK1,FK2,key_hash)
SELECT
    dupes.FK1,
    dupes.FK2,
    dupes.key_hash
FROM
    (
    SELECT
        hashes.*,
        ROW_NUMBER() OVER (PARTITION BY key_hash ORDER BY FK1) AS RN
    FROM
        (
        /* make sure FK1 is always smaller than FK2.  This eliminantes a lot of more complicated logic later on */
        SELECT
            CASE WHEN FK1 <= FK2 THEN FK1 ELSE FK2 END AS FK1,
            CASE WHEN FK1 <= FK2 THEN FK2 ELSE FK1 END AS FK2,
            CASE WHEN FK1 <= FK2 THEN '|'+FK1 +','+FK2+'|' ELSE '|'+FK2+','+FK1+'|' END AS key_hash

        FROM
            #relationships
        ) hashes
    ) dupes
WHERE
    dupes.RN = 1;

This still uses the key_hash CASE statement I mentioned previously, but it only creates it once initially to dedupe the entries regardless of their order.

Grouping Related Records

With a deduped dataset, the recursive query for finding groups of related records is (relatively) straight-forward (and a refresher if you need to remember how recursive CTEs work in SQL Server):

WITH c AS (
    /* The initial query */
    SELECT
        1 as level,
        DENSE_RANK() OVER(ORDER BY FK1) group_id,
        key_hash,
        FK1
        ,FK2
    FROM
        #deduped
    UNION ALL
    /* The recursive query that runs once for each of the above query's output and then once on every row from each subsequent result */
    SELECT 
        c.level+1,
        c.group_id,
        t.key_hash,
        t.FK1,
        t.FK2
    FROM 
        c
        INNER JOIN #deduped t
            ON c.FK2 = t.FK1
    WHERE
        /* don't include combinations already matched */
        c.key_hash not like '%'+t.key_hash+'%'
), 
/* regular CTE  */
output_with_dupes as (
SELECT
    level,
    key_hash,
    group_id,
    FK1,
    FK2,
    ROW_NUMBER() OVER (PARTITION BY key_hash ORDER BY group_id) AS  RN
FROM
    c
)

-- deduped output
SELECT
    group_id,
    FK1,
    FK2
FROM
    output_with_dupes
WHERE RN = 1
ORDER BY
    group_id,
    FK1
OPTION(MAXRECURSION 0)

Some records are duplicated after the initial join, so a subsequent expression is used to dedupe the final records. The result is a table of rows containing a group_id of records that are related to each other.

Couldn't you have done this with hierarchyid?

Probably.

But I wanted a solution flexible enough to reuse for other relational databases that don't support hierarchyid as well.

More than just recursive queries

If you've never written a recursive SQL query before, hopefully this post gives you an idea of how to do it.

The more important takeaway is that sometimes it's easier to solve a data problem than a query problem. Like I said previously, my initial recursive query was hideous and unmaintainable because of the duplicates I had in my data. Rather than living with that query, it made sense to clean up the data first and then write the simpler query on the clean data.