Methods for Finding Related Reddit Subreddits with Simple Set Theory (2024)

I recently wrote a post on how to visualize network graphs of Reddit subreddits.

One of the reasons I’ve been researching the topic is to find a good way to facilitate discovery of lesser-known subreddits, as Reddit is doing a terrible job at it (although they have been trying a few new experiments very recently). As it turns out, invoking graph theory is overkill. Even fancy machine learning approaches like collaborative filtering, while powerful, may not be required to help Redditors discover new things.

Let’s say we have two sets: Set A, where A represents the number of active users in a given subreddit, and set B, where B is the set of active users in a subreddit. The intersection of Sets A and B (A ∩ B) represents users who are active in both subreddits.

Using BigQuery, I can get the comment data from ALL public Reddit subreddits, as otherwise this technique would not work well using any smaller subset. The network graph edgelist conveniently gives (A ∩ B), obtained as described in my previous post, which calculates the number of active users for all pairs of subreddits (defining “active users” as users who have made a comment in at least 5 unique threads in a given subreddit within the past 6 months).

Methods for Finding Related Reddit Subreddits with Simple Set Theory (1)

In this case, we can filter the edgelist to only allow intersections where there are at least 10 active users; this prevents including dead and personal subreddits.

We can run another similar query to get the number of active users for each subreddit.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (2)

After that, for a given subreddit A, find:

(A ∩ B) / (B)

for all subreddits B where (A ∩ B) > 0 (i.e. only neighbors of A). This computation takes less than a second. Additionally, the output is always a percentage between 0% and 100%. For the visualizations, we plot the Top 15 subreddits with the highest overlap of the specified subreddit A (and color the bars with a nice viridis palette to provide another easy way to perceive relative magnitude of relatedness).

The methodology may sound arbitrary, but the results are very interesting. Here’s a chart of the top related subreddits for /r/aww, one of the most popular places on the internet for cat pictures.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (3)

I have honestly never heard of any of these subreddits before. But yet, by analyzing public user activity alone, I found a few new places to get more cute pics.

This methodology is excellent for finding subreddit-specific subsubreddits which may not be documented. The related subreddits for /r/buildapc offer more places to get PC building advice.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (4)

Related subreddits for sport-specific subreddits, like /r/cfb (college football) include the corresponding teams.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (5)

/r/food related subreddits list a surprising number of subreddits dedicated to specific foods.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (6)

There is a surprising amount of depth to the /r/me_irl network.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (7)

The chart for /r/programming can tell you which subreddits exist for specific programming languages and technologies.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (8)

The methodology can also reveal a lack of related subreddits, by the large contrast between subreddits with high relatedness and low relatedness. For example, while /r/cfb may have large numbers of obviously-related subreddits as a sports subreddit, /r/golf has only 2.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (9)

You can view Related Subreddit charts for the Top 200 Subreddits in this GitHub repository.

Finding Similar Subreddits

Another method for finding related subreddits would be to find subreddits with similar communities. An academic approach to finding similarity between sets is the Jaccard Index. Using the same set A and set B definitions above, the formula now becomes:

(A ∩ B) / [(A) + (B) - (A ∩ B)]

which outputs the Jaccard Index, between 0 and 1. This formula only requires a few tweaks to the original code. The results from this computation tell a different story.

Here are the most-similar subreddits to /r/aww:

Methods for Finding Related Reddit Subreddits with Simple Set Theory (10)

In this implementation, the default Reddit subreddits must be removed from the results, as the communities of default subreddits are largely similar to most others by design. Even former defaults like /r/adviceanimals and /r/technology still have large amounts of holdout users which skew the results. As /r/aww is a mass-appeal subreddit, it makes sense that the communities are similar to other mass-appeal subreddits.

The magnitude of the Jaccard Index measures the strength of the similarity. Most subreddit relationships have a low Jaccard Index, but the relative magnitude between all subreddit neighbors illustrate comparisons for potential related subreddits regardless (this is also the reason why the x-axis is not fixed across plots). The subreddit relationship with the highest absolute similarity is /r/arrow and /r/flashtv at 0.345, which make sense given the massive overlap between the two CW television shows.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (11)

The Jaccard Index is more useful for finding similar subreddits to niche subreddits. Let’s try a few of the subreddits mentioned previously and see how the results changed.

/r/buildapc is a niche, and the output identifies well-established subreddits, unlike with the previous related-subreddit methodology.

Methods for Finding Related Reddit Subreddits with Simple Set Theory (12)

The subreddit most similar to /r/cfb (college football) is /r/collegebasketball!

Methods for Finding Related Reddit Subreddits with Simple Set Theory (13)

The subreddit most similar to /r/food is /r/cooking!

Methods for Finding Related Reddit Subreddits with Simple Set Theory (14)

The subreddit most similar to /r/programming is /r/linux! (of course)

Methods for Finding Related Reddit Subreddits with Simple Set Theory (15)

You can view the Similar Subreddit charts for the Top 200 Subreddits in this GitHub repository.

Again, Reddit has significantly better internal data for identifying user activity between subreddits, such as voting patterns and clickthrough tracking. But the results shown using these two set methodologies are pretty good for using public data. In fact, these two set approaches can theoretically work with any set of categorized, settable data, which may give me a few ideas for new blog posts in the future.

And there’s still the fancy machine learning approaches to try.

As always, the full code used to process the comment data and generate the visualizations is available in this Jupyter notebook, open-sourced on GitHub.

If you do find any other interesting trends in the related/similar charts of other subreddits and write about it, it would be greatly appreciated if proper attribution is given back to this post and/or myself. Thanks!

Methods for Finding Related Reddit Subreddits with Simple Set Theory (2024)

FAQs

How to find subreddits related? ›

Subreddit Search

Reddit has a subreddit search feature at reddit.com/subreddits. To use the subreddit search, look for the box that says, “what are you interested in?” and enter keywords that are related to your niche.

Where can I find Reddit communities? ›

If you're using the Reddit app, the best place to find relevant and local communities you might be interested in is in the Communities tab.

How do I find suggested Subreddits? ›

Ask for recommendations: Ask your friends or colleagues if they have any recommendations for subreddits that they find interesting or funny. Check out the subreddit of the day: The subreddit of the day is a community that features a different subreddit each day. This is a great way to discover new subreddits.

What are communities on Reddit called? ›

Posts are organized by subject into user-created boards called "subreddits".

Are there private Reddit communities? ›

Private subreddits usually have content that wouldn't be appropriate for the whole world to see. For example, embarrassing/humiliating stories or personally identifying information that could give someone's identity or location away. They might also be only for a closed group of friends or family.

Can Reddit communities be deleted? ›

Because communities on Reddit are shared spaces that in many ways belong to the members of the community as much as the creator, there's no way to delete a community. However, if you created a community that you don't want anymore or don't have use for, you can remove yourself as a moderator.

How to search subreddits on redreader? ›

if you type the exact name of the subreddit you can tap on the "> /R/SUBREDDITNAME" just under the search , before the results.

How do you search anonymously on Reddit app? ›

Anonymous browsing is only available on the Reddit app. Open the Reddit app and tap on your avatar on the top right to open the account menu. Tap the arrow icon ( ) next to your username. Select Anonymous Browsing from the pop-up menu.

How do you search Reddit users posts? ›

You can use the Reddit search bar and type in "author:[username]". This will bring up all posts and comments containing content by that user. While not exactly an archive, it allows you to search for specific keywords within the user's comments.

Is there an algorithm for Reddit? ›

So yes. Reddit uses and algorithm and API (data from other accounts) to generate your user experience.

How do you search effectively on Reddit? ›

Advanced Search Techniques

For instance, if you're looking for a specific post within a subreddit, use the format subreddit:SUBREDDIT_NAME exact phrase. This will return posts in that subreddit with your exact phrase. You can also use author:USERNAME to find posts by a specific user.

Does Reddit have a recommendation algorithm? ›

Given a user and a subreddit, our algorithm recommends to the user novel subreddits within the same subcommunity.

Can you see what Subreddits someone is part of? ›

You cannot directly see which subreddits someone follows on Reddit. Unlike some social media platforms that provide visibility into a user's connections or subscriptions, Reddit prioritizes user privacy, and subreddit subscriptions are not publicly visible.

What are the NYC related subreddits? ›

New York has /r/nyc, /r/AskNYC, /r/nycpics, /r/newyorkcity (a smaller alternative for locals), a sub for each borough (Bronx, Brooklyn, Manhattan, Staten Island, and Queens), subs for neighborhoods like Astoria, a sub for biking and two for the subway, a history sub, a food sub, and an opera sub.

Can you see what subreddits you've visited? ›

A new Recently Visited section will display the last three subreddits a user has visited, and users can click see all to view all of their recently visited subreddits.

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Tish Haag

Last Updated:

Views: 5947

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Tish Haag

Birthday: 1999-11-18

Address: 30256 Tara Expressway, Kutchburgh, VT 92892-0078

Phone: +4215847628708

Job: Internal Consulting Engineer

Hobby: Roller skating, Roller skating, Kayaking, Flying, Graffiti, Ghost hunting, scrapbook

Introduction: My name is Tish Haag, I am a excited, delightful, curious, beautiful, agreeable, enchanting, fancy person who loves writing and wants to share my knowledge and understanding with you.