This comment originated here: https://hexbear.net/comment/6055450, as always, I intend not to drive conversation to that thread. If you wish to continue this conversation, feel free to leave a comment here.
The inputs that decide what lives and dies are the same. A thread that lives under the current math and same inputs would simply live longer under the old math and same inputs. That’s what I’m trying to say. Here, I even graphed it out so we can see what I’m talking about. One post, with the same vote score (1000), which both get one new comment every four hours for a week.
This is already bad enough, but, look at what happens when the posts are both gaining votes. This simply adds a random number between 0 and 5 every hour to the post score.
The old algorithm pushes the score even higher. It makes the thread creep up and up. Sure, it decays pretty fast between comments, but people will be returning to the thread from their inbox as they reply to people replying to them. This keeps the thread pushed up, inviting more people to leave top-level comments.
Just to illustrate your point, here are two Hexbear threads, one that receives no comments, and one that gets a comment every 4 hours, both with 1000 upvotes.
A thread with comments every 4 hours will have a sub 250 rank value after about 16 hours. A thread with a similar score and no comments for the duration will have a sub 250 rank value after just 4 hours. So, that’s a long ass time.
The disparity between active and “inactive” threads is even worse in the default sort. A thread with a high enough score could maintain a top level position in the rankings for two who days, while one that gets no comments drops off in the first 5 hours.
This is, to my understanding, the most recent version of the “Hot Rank” that all other ranks are based on in Lemmy.
CREATE OR REPLACE FUNCTION hot_rank (score numeric, published timestamp without time zone)
RETURNS integer
AS $$
DECLARE
hours_diff numeric := EXTRACT(EPOCH FROM (timezone('utc', now()) - published)) / 3600;
BEGIN
IF (hours_diff > 0) THEN
RETURN floor(10000 * log(greatest (1, score + 3)) / power((hours_diff + 2), 1.8))::integer;
ELSE
RETURN 0;
END IF;
END;
$$
LANGUAGE plpgsql
IMMUTABLE PARALLEL SAFE;
The thing that makes a Hot Rank a Hot Active Rank is whether you are sending the Published Date to the Hot Rank function OR the most recent comment timestamp (clamped at 48 hours, after that, it defaults to Publish Date)
diesel::update(post::table.find(post_id))
.set((
// Normal Hot Rank, uses Published date.
post::hot_rank.eq(hot_rank(post::score, post::published)),
//Active Hot Rank, uses newest_comment_time_necro date.
post::hot_rank_active.eq(hot_rank(post::score, post::newest_comment_time_necro)),
post::scaled_rank.eq(scaled_rank(
post::score,
post::published,
interactions_month,
)),
))
.get_result::<Self>(conn)
.await
So, for two days, threads on a Lemmy instance other than Hexbear can have their timestamp refreshed, as if they were just posted, with their current score, so long as someone is leaving a comment.
So, this leads me to a couple of points.
- Yes, Hexbear threads with comments have a higher rank than threads with no comments, and decay way slower.
- The Hexbear algo depresses the impact of a new comment over time, meaning the thread decays faster than on normal Lemmy.
- The normal Lemmy active sort results in front pages that can feel stagnate every two days. This suppresses new threads, unless those threads get a lot of vote traction and sustained conversation.
- Every few hours, new threads are climbing to the top of the Hexbear front page.
- In both systems, “important news story posts with no comments” die faster because they have no comments. The default Lemmy algo may actually be better about this because the first comment isn’t subject to a time penalty like on Hexbear.
- All our “bump” bots are pretty useless because they comment nearly instantly after someone summons them, not really bumping the thread at all, plus even if it was delayed, it has a lesser impact because of the decay. Those bots were way more effective under the old aglo.
You would need to come up with new or different inputs, or a whole new method of ranking that doesn’t leave threads with no comments in the dust. Comment count could be considered, if a thread has a low comment count, or no comment count, then maybe it shouldn’t be impacted by the decay in the Hexbear algo. Or the number of comments could speed up the thread decay, while threads with longer spans between comments do not.
If people want to double-check my work, you can find it here: https://gist.github.com/The-RedWizard/d4567266537673ce4d2009c518951154
I think my implementations are correct.
I love the graphs. I love the enthusiasm you have.
I see a problem with your methodology. You are comparing the algorithm on posts which start with 1000 upvotes and never changes. Posts start with 1 upvote and the number of upvotes grows based on visibility in the algorithm. The algorithm used influences the number of upvotes that a post will get.
A post with 0 comments can not reach 1000 upvotes. Comparing the decay of 2 posts with the same number of upvotes does not make sense because the algorithm also changes the upvotes that a post will get. The lack of comments causes lack of visibility which causes diminishing upvote rate, which all compounds. A faster decay means that a post with no comments loses visibility even sooner.
From my experience, posts are primarily viewed in 2 ways, on ‘new’ sort and on ‘active’ sort. The front page of ‘new’ sort lasts around 1 hour typically and then users won’t upvote the post any more from ‘new’ sort. After 1 hour, the post only gains upvotes if it is visible in the ‘active’ algorithm. A post with 0 comments never even sees the front page of ‘active’ sort, therefore stops gaining upvotes at all after 1 hour. To compare the algorithms accurately, you would need to have a new function where a post only gains upvotes when the post is visible on the algorithm front page after the first hour.
I think it’s ineffectual to try to cull controversial threads using an algorithm. This purpose was originally served by downvotes. This function can likely only be replaced by active moderators, ie locking posts with excessive arguments.
The decay being exponential seems unnecessarily aggressive. Why can’t the decay be linear and still reach a value of 0 after 24 hours?
example: SCORE * ((24 - X) / 24), where X is the age of the post in hours.
Editing my post because I want to be precise and I want to be respectful, but some of my thoughts feel abstract. My complaint is that posts with important news are dying after 1-2 hours. My speculation is that exponential decay increases the chances that important news stories are disappeared into the void within 1-2 hours. Whether hexbear’s algorithm is giving a lower hypothetical score to controversial posts is a tangent.
The second graph starts with 1 vote (the default) and each gain anywhere between 0 and 5 upvotes at random per hour. I could have a more robust simulation that makes the post get more votes the higher its rank is, but, in testing I learned that rank in both algorithms is demonstrably impacted by the number of hours since posting, far more then score. This is because the reward for high score count is logarithmically scaled.
The Hexbear algorithm allows posts that get comments in the first couple of hours a sizable boost, but that boost is diminished over time. Sustained conversation causes the thread to still taper off. Where as in the default, the boost is the same for two days and then violently ends.
The benefit is that a post will slip down the ranks as time passes and not dominate the front page.
The flaw as you point out is that threads with no comments die within hours. That is true for either algorithm. If something feels important, people should comment on it.
That decay is part of the standard Hot_Rank algorithm that is used in both. The base decay could be to aggressive. Any changes made to that will result in a more stale feed as all posts take longer to drop off.
One way this could be combatted is by doubling the score of a post with no comments or even changing the decay from 1.8 to something lower, such as 1.4 so it decays slower until it gets a comment.
This is why I want to build a simulation engine that can generate a a feed of posts, growing upvotes and growing comment counts.
I’m not convinced my methods here are perfect. But I think my general conclusions seem true. Comments are very important. In the other conversations here you’ll see two reasons given for the current algorithm: minimizing struggle sessions and ensuring the mutual aid comm isn’t totally washed out.
There could be other ways to handle this, such as explicitly making the posts from the mutual aid community rank higher or decay slower or both. But it requires more testing.