Data with Bert logo

Crosstream - Efficient Cross-Server Joins on Slow Networks in Python

Earlier this year I presented at PyCon 2023 in Salt Lake City, UT on the topic of "Efficient Cross-Server Data Joins on Slow Networks with Python". You can watch the presentation on YouTube:

Watch this presentation on PyCon's YouTube channel.

Slides from the session can be found https://github.com/bertwagner/Presentations/tree/master/crosstream.

The crosstream Python package mentioned can be found at: https://github.com/bertwagner/crosstream.

jurn - A Command Line Tool for Keeping Track of Your Work

Watch this week's video on YouTube

How do you keep track of your daily work accomplishments?

If you are like me, you wait until the end of the week and then dread having to think back all the way to Monday to try and remember what you did that week. By that point all I usually remember is what I ate for breakfast that morning and the immediate problem I was working on before doing my weekly writeup.

I wanted to get in the habit of better documenting my work accomplishments, so I built jurn, a command line tool to help tag and log work as you do it. It has made tracking my work easier, making it simple to share progress in stand-up meetings and invaluable for end of the year performance evaluations.

In this post I will show you its most important features and how I use it.

Install

jurn is a command line tool I wrote in Python. To install, you just need to run pip install jurn. You can also modify and build from the source code hosted on GitHub.

jurn stores its data in a local sqlite database, meaning no one else ever sees your data and you can query the data yourself if you have other needs for it that jurn doesn't currently support.

Logging

Whenever I want to log an accomplishment throughout the day, I simply use the jurn log command to save it:

jurn log -m 'Wrangled cats via lasso.'

If I want to add some organizational structure to my notes, I can also include tags:

jurn log -m 'Wrangled cats via lasso.' -t 'Physical Fitness'

jurn also comes built in with tab complete functionality for tags, so if you type in:

jurn log -m 'Wrangled cats via lasso.' -t 'Physical <TAB>

jurn will get a distinct list of your previously used tags from the database, helping you pick the right one to autocomplete with:

Physical Fitness
Physical Security

The tagging system also allows for subcategories denoted by the # sign, allowing for even more organization:

jurn log -m 'Wrangled cats via lasso.' -t 'Physical Fitness#Cardio'

NOTE: To enable autocompletion, you must add this to your ~/.bashrc. eval "$(_JURN_COMPLETE=bash_source jurn)" If using other shells, please reference the Click documentation for the specific line you need to add.

Viewing Entries

Once it's time to reflect on what you did for a particular day (or week, month, year, etc..), you use the jurn print command to get a pretty printed version of your log entries:

jurn print -d week
2022-11-01 to 2022-11-07
   Physical Fitness
     Cardio
       Wrangled cats via lasso.
     Strength
       New PR benching 2 mules.
   Physical Security
     Installed new locks on the doors.
     Added storage to camera recording devices.

Now you can easily share with your self/boss/team what you worked on. Come performance evaluation season, you'll also have a nice reminder of what you did all year that justifies your raise or promotion.

Automation

While running the jurn command whenever I finish a note-worthy task is quick and easy, I don't trust myself to always remember to do it.

Instead, I add a line like:

jurn --early-end 60 log -t

to my ~/.bashrc in order to have jurn remind me to log any accomplishments every time I open a new terminal tab/window (which I do all day long).

To prevent it from being too annoying, I pass in the ---early-end option so it knows not to prompt me for my updates if I have already written one in the past 60 minutes.

...And More

I built jurn to fulfill the needs I had with keeping track of my work accomplishments. All of the tool's various options can be found by running jurn --help or viewing jurn's documentation.

If you've read this far, I assume you value keeping track of your daily work progress and I hope you can use jurn to make that process a little bit easier.

Using Python And NetworkX To Build A Twitter Follower Recommendation Engine

Watch this week's video on YouTube

This week, I want to share my process for analyzing Twitter. Specifically, I want to find who all of my friends follow on Twitter that I don't currently follow. Essentially, I want to build a Twitter follower recommendation engine.

Let's start with the theory behind the problem I'm trying to solve. On Twitter, I follow a bunch of people. Around 450 at the time of filming this video.

a

These are people I've either met in person and want to keep in touch with or am interested in following - or both!

Now I don't attend many conferences and events, so finding more people to follow that way is slow going; I probably only meet a handful of new people every year.

However, the second category of people, the interesting ones out there on the internet, is almost limitless. The problem is finding them in a virtual sea of bots, marketing-only accounts, and complainers.

Twitter does have a "Who to Follow" page, and while I'm sure it has some great suggestions, I don't necessarily trust all of Twitter's recommendations. It's kind of like trusting a site like Yelp for restaurant reviews when the local McDonalds rates better than your favorite burger place. I'm sure that particular McDonalds is beautiful by fast food standards, but I'm looking for more personalized recommendations.

So that leads me to this project: I decided to find better personalized recommendations on Twitter by looking at who my friends are following that I am not.

Getting the Data

To start, I needed a list of the people I follow on Twitter, and then a list of who they follow. The proper way to do this is to use the official Twitter API, so I wrote some Python code to do exactly that. Twitter calls the people you follow "Friends", which is funny since I usually don't consider one-way relationships friendships.

Screenshots of code are lame - go to GitHub to see the actual code.

I won't go through the code in detail (you can download it from GitHub if you'd like to do that) but essentially it uses the friends/list endpoint to download all of my Friends. I then went through each of those Friends and found all of their Friends.

In total, this ended up being around 125,000 people.

As a quick side note, the Twitter API is terrible. Using it is fine, but the rate limiting is horrendous. I was basically capped at downloading 12,000 people per hour, which meant it took about a day to download all of the data I needed.

Graph Analysis with NetworkX

Once I had the data downloaded, it was time to find relationships between my friends and the people they follow. For this, I decided to use an open source Python library called NetworkX. NetworkX helps perform complex network analysis, which is perfect for what I was trying to do.

NetworkX uses a graph structure to help with its analysis. A graph is made up of of nodes and edges. In our case, the Twitter users are our nodes, and our edges are the relationships. The first thing I did was load all of the people I follow and created a directional edge (aka. an arrow) to indicate the one-way relationship from me to them. Next, I looped over all of those people's friends, adding additional nodes and edges.

extended-network

At this point, my graph had 125,000 nodes which was way too many to draw quickly on my computer, and way too many to model with Lego minifigs. I figured not ALL of this data would be useful and I needed a way to filter it.

I started by filtering out any of my friends' friends who happen to already be my friends - this wouldn't be useful since I already follow these people.

Next, I decided to keep only the top 50 most followed people in that second tier of Twitter users. This would limit the processing needed while still giving me a list of the most followed Twitter users that my friends follow that I don't currently follow. Once again, all of this code is available on my GitHub.

At this point, I was able to use NetworkX's drawing capabilities to beautifully render the network of users and relationships. Sure, I could have just listed out the top users that I don't follow, but there is something special about being able to create a visual network of those relationships.

)

So that's it. It was fun using a graph library to find new people to follow on Twitter using the people I currently follow as a proxy for finding relevant users instead of Twitter's black box algorithms.