This comment originated here: https://hexbear.net/comment/6055450, as always, I intend not to drive conversation to that thread. If you wish to continue this conversation, feel free to leave a comment here.
The inputs that decide what lives and dies are the same. A thread that lives under the current math and same inputs would simply live longer under the old math and same inputs. That’s what I’m trying to say. Here, I even graphed it out so we can see what I’m talking about. One post, with the same vote score (1000), which both get one new comment every four hours for a week.
This is already bad enough, but, look at what happens when the posts are both gaining votes. This simply adds a random number between 0 and 5 every hour to the post score.
The old algorithm pushes the score even higher. It makes the thread creep up and up. Sure, it decays pretty fast between comments, but people will be returning to the thread from their inbox as they reply to people replying to them. This keeps the thread pushed up, inviting more people to leave top-level comments.
Just to illustrate your point, here are two Hexbear threads, one that receives no comments, and one that gets a comment every 4 hours, both with 1000 upvotes.
A thread with comments every 4 hours will have a sub 250 rank value after about 16 hours. A thread with a similar score and no comments for the duration will have a sub 250 rank value after just 4 hours. So, that’s a long ass time.
The disparity between active and “inactive” threads is even worse in the default sort. A thread with a high enough score could maintain a top level position in the rankings for two who days, while one that gets no comments drops off in the first 5 hours.
This is, to my understanding, the most recent version of the “Hot Rank” that all other ranks are based on in Lemmy.
CREATE OR REPLACE FUNCTION hot_rank (score numeric, published timestamp without time zone)
RETURNS integer
AS $$
DECLARE
hours_diff numeric := EXTRACT(EPOCH FROM (timezone('utc', now()) - published)) / 3600;
BEGIN
IF (hours_diff > 0) THEN
RETURN floor(10000 * log(greatest (1, score + 3)) / power((hours_diff + 2), 1.8))::integer;
ELSE
RETURN 0;
END IF;
END;
$$
LANGUAGE plpgsql
IMMUTABLE PARALLEL SAFE;
The thing that makes a Hot Rank a Hot Active Rank is whether you are sending the Published Date to the Hot Rank function OR the most recent comment timestamp (clamped at 48 hours, after that, it defaults to Publish Date)
diesel::update(post::table.find(post_id))
.set((
// Normal Hot Rank, uses Published date.
post::hot_rank.eq(hot_rank(post::score, post::published)),
//Active Hot Rank, uses newest_comment_time_necro date.
post::hot_rank_active.eq(hot_rank(post::score, post::newest_comment_time_necro)),
post::scaled_rank.eq(scaled_rank(
post::score,
post::published,
interactions_month,
)),
))
.get_result::<Self>(conn)
.await
So, for two days, threads on a Lemmy instance other than Hexbear can have their timestamp refreshed, as if they were just posted, with their current score, so long as someone is leaving a comment.
So, this leads me to a couple of points.
- Yes, Hexbear threads with comments have a higher rank than threads with no comments, and decay way slower.
- The Hexbear algo depresses the impact of a new comment over time, meaning the thread decays faster than on normal Lemmy.
- The normal Lemmy active sort results in front pages that can feel stagnate every two days. This suppresses new threads, unless those threads get a lot of vote traction and sustained conversation.
- Every few hours, new threads are climbing to the top of the Hexbear front page.
- In both systems, “important news story posts with no comments” die faster because they have no comments. The default Lemmy algo may actually be better about this because the first comment isn’t subject to a time penalty like on Hexbear.
- All our “bump” bots are pretty useless because they comment nearly instantly after someone summons them, not really bumping the thread at all, plus even if it was delayed, it has a lesser impact because of the decay. Those bots were way more effective under the old aglo.
You would need to come up with new or different inputs, or a whole new method of ranking that doesn’t leave threads with no comments in the dust. Comment count could be considered, if a thread has a low comment count, or no comment count, then maybe it shouldn’t be impacted by the decay in the Hexbear algo. Or the number of comments could speed up the thread decay, while threads with longer spans between comments do not.
If people want to double-check my work, you can find it here: https://gist.github.com/The-RedWizard/d4567266537673ce4d2009c518951154
I think my implementations are correct.
Your understanding is correct, yes.
Including both sorting algorithms (or a novel one in addition to the existing one) would definitely complicate the picture, though I’m not really sure how significantly. What I did was fairly simple, whereas adding a new sorting option would require changes to the UI code in addition to back-end changes like I made. It’s not impossible, but it would require more work, and would increase the odds of updates to upstream Lemmy requiring more work to merge in (if they change something in any of the parts of the code we’ve modified, there will be a merge conflict that has to be resolved).
Regarding struggle sessions, there’s no true solution, but I think having the thread fall off the feed fairly soon helps to mitigate them to at least some degree. I don’t have any data to back that up, though, so I could be wrong.
As for implementing a novel algorithm as a replacement to existing sort, I’m no better equipped than you to devise one–in fact, your graphs make clear that you’re more prepared to analyze algorithmic effects than I am–I just plugged a few values in to get a sense of what it was doing at various times after the post was made.
If you’re interested in working on something like that, it could be implemented on test.hexbear.net to see what effects it would have (if we refresh the data every so often, we can get at least some sense of how it affects post rankings). If you are, I suggest reaching out to an admin to get an invite to the dev chat. (I’m unfortunately not likely to have any time for working on Hexbear code anytime soon, though. Plus I don’t really know much of anything about Rust anyway.)
Ah so not much a breaking change but one that requires integration. Def a lot of work and it wouldn’t be supported by 3rd party apps.
As for my aptitude, I’ve played around with numpy and pandas for other projects in the past. I’m not 100% sure if I graphed things accurately. I’m probably going to try and build a simulation that uses actual datetimes and implements the algorithms unchanged.
I’m curious, in the function byou wrote, where sis you come up with the
0.000012146493725346809
value? Is it simply arbitrary or is it something specific?I didn’t write it; that was someone else back in the earliest days of Hexbear, back when it was still chapo.chat. I just took the SQL from the old codebase and ported it into the new one in the appropriate way to make it work.
As far as I know, it is arbitrary, though.