social.outsourcedmath.com

maegul mastodon (AP)
Graphs of the sizes of fediverse instances, how common they are, and where the most people are! 🧡

Data pulled from https://instances.social/ (by @TheKinrar) and excludes pawoo and baraag as they're heavily blocked for good reasons (it seems)

Breaking down instances by the number of users into bins (that are quasi human friendly logarithmic), we see that the majority (55%) have 2-50 users, ~33% have 1 user, and almost all instances have less than 5,000 users.

@fediversenews

1/
Histogram of fediverse instances by account size ... see toot for description
2 people reshared this
maegul mastodon (AP)
Though most instances are very small (in user count), the large instances are very large by comparison. The result is that the 20-30 largest instances host around half of all the users of the fediverse.

This graph is a cumulative percentage of all users starting with the largest instance and descending. By 20th largest, we've got 50% of all users. Mastodon.social hosts ~16%! The top 10 get you ~40%. Note that this includes 2 large japanese instances (mstdn.jp+mastodon.cloud)
Line graph of the cumulative percentage of all users hosted by the largest instances ... see toot for description
maegul mastodon (AP)
Which for me leads to the question of how many people are on instances of what size (eg, what percentage of all users are on instances with 10-20K users?)

Well, turns out it's pretty even (using the bins from above), from 1K to 1M users, with 10% users in 1K-5K instances and ~13% in 50K-100K instances. Only below 1K user instances do you get a substantial drop off in the number of users on such instances.

Take away for me, plenty of people on 1K to 40K instances!

3/
Bar chart of the percentage of all users on instances of various sizes showing that most users are NOT on instances with fewer than 50 users, but above that, users are fairly evenly distributed across the bins of sizes (see toot text for more information / interpretation)
maegul mastodon (AP)
Taking the data from above, we can make a cumulative percentage graph (line chart) over the same bins as above.

We see that the halfway mark is ~50K users. So half of the fediverse are on instances with 50K or more users, half on instances with less.

Slightly more technically, this line is pretty straight (as users are roughly evenly spread out, highlighted above). Given that the bins are roughly logarithmic-ish, this hints that the distribution is a power law.

end/
Graph of the cumulative percentage of all fediverse users accounted for by instances of increasing size (where instances are aggregated into bins of roughly logarithmic ranges, eg, 0-2, 2-50, 50-1000, 1000-5000, etc).

The line is roughly straight from the 50-1000 users bin up to the 1M bin, passing 50% of all users at around the 40,000 to 50,000 users bin.
maegul mastodon (AP)
Simple extra without a graph, on the power law thing. A log-log histogram by user count does show a very linear relationship. Graph not included because it'd probably be confusing and I don't know anything about logarithmic binning.

All I'll say is that on the log-log plot there was small but clear bulge in the mid-sized instance range (1K-100K users), which may represent a certain "sweet spot" of instance size that people are attracted to ??
Gabriele Pollara mastodon (AP)
lol, I was about to ask about the log - log plot, and as you say, that shows the bulge / exception where the greater observed > expected are.

Interesting data. Question for me is how user experience matches to size of instance. I. E. Where is the most enjoyable / rewarding home in the Fediverse? πŸ€·β€β™‚οΈ
maegul mastodon (AP)
@gpollara
The secret might be that people vary in what they want and are looking for, thus the distribution we've got where any user is as likely to be on a large instance as they are to be on a small instance.
Gabriele Pollara mastodon (AP)
yes, i think that's right. In that respect, interesting that Fediverse may be disproportionately attracting people with niche communities / interests in the middle of distribution, though I may be over-speculating!
maegul mastodon (AP)
@gpollara I don't think so, there are probably very real reasons why people would end up on a 100-1K user instance rather than a 50 user instance, and for similar reasons, prefer a 50K instance over a 200K instance.
Evelyn mastodon (AP)
When I was instance-shopping upon people being asked to leave mastodon.social (my first instance) during the most frenzied part of the Twitter exodus, I chose 3K-30k users as my sweet spot. Less than 30K because I wanted to be able to go knock the moderators’ virtual door down if they proved unresponsive to things in need of moderation; & more than 3K because my experience with bulletin boards, etc, was they could be claustrophobic or moribund at >1K users or so.
maegul mastodon (AP)
@gorfram
Yep, I take your point and I think you are right! See, eg, my extra post here: https://hachyderm.io/@maegul/110331536984068521 where I comment on how the data might indicate that people are indeed gravitating to instances in the size range you highlight.

So maybe the 40K-50K bin is just a kink, but yea, maybe you're totally right, and it's a valley between two kinds of users/instances.

Cities and their populations probably demonstrate a similar pattern??

Simple extra without a graph, on the power law thing. A log-log histogram by user count does show a very linear relationship. Graph not included because it'd probably be confusing and I don't know anything about logarithmic binning.

All I'll say is that on the log-log plot there was small but clear bulge in the mid-sized instance range (1K-100K users), which may represent a certain "sweet spot" of instance size that people are attracted to ??

MarjorieR mastodon (AP)
@gorfram most of us are sticky and don't change servers unless we get our choice completely wrong.
Some servers are also open to growing and others are not - scaling moderation is an issue here for some
So some users in a server of a particular size were in a smaller server when they first joined so what we see now may not fully reflect initial user choice.
PJ Coffey mastodon (AP)
Thankyou for the fascinating thread on sizes of server. 50% of Mastodon being on centralised servers is a thing alright.

#Mastodon #MastodonServer #MastodonSocial #MastodonDesign
@Homebrewandhacking
and for the big instances. how active are the users? If i look at my server stats is see that have the third active one
server
maegul mastodon (AP)
@redegelde @Homebrewandhacking
Yes that’s also relevant. I’m unclear on the data I got in activity and how it compared to that in fedidb. So I didn’t do any analysis on that. Maybe a bother time or someone else would be keen.
PJ Coffey mastodon (AP)
@redegelde
What are those activity numbers? Peak users? Median, modal, mean average?

Either way. Shockingly low to see 4 digits there.
@Homebrewandhacking
is the data i see from my admin site, can not point the vinger on it look on other servers. En what active means
Jigme Datse mastodon (AP)
That line looks straight, but how do those bin sizes distribute in terms of ... OK. Got it...
You'd think users should self-load-balance and join instances that are not heavily loaded?
Benjamin mastodon (AP)
@blackburied possible solution: fediverse plinko. migrate everybody into a random instances to balance things out. πŸ˜€
Evelyn mastodon (AP)
The dip between 40K & 50K on the % users vs. instance size chart is interesting to me. My takeaway is that many/most >50K instance users came to Mastodon, found an easy-to-find instance, and have seem no reason to change instances.
While >40K instance users found the β€œmega-instances” unwieldy, too much like corporate SM, &/or unresponsive to legit concerns; but still wanted the SM experience of a well populated instance.

*based on no specific knowledge or expertise
maegul mastodon (AP)
@gorfram Maybe ... most likely just a kink in the data and the binning I chose.
Evelyn mastodon (AP)
Entirely likely. Reading too much into a set of numbers is one of my failings/talents.
Gabriele Pollara mastodon (AP)
my real concern, which you cannot address from these analyses, is the interoperability & discoverability across these instances. I dont think I fully understand it, but I'm not sure it's 2 way across all instances (even without blocking). i.e. are all instances truly connected and discoverable to each other?
Tristan Harward mastodon (AP)
Genuinely cool that the biggest instance hosts "only" 16% of users. I would have thought higher. Good sign.
Luca Sironi pleroma (AP)
isn’t counter.social fully defederated ?
maegul mastodon (AP)
@luca Huh ... don't know.

How would one find out? Is there a single or good source of truth on these things?

It's numbers aren't large enough to change the shape of the data though.

More broadly though, I have no idea how federated any of the instances in the dataset are (apart from pawoo and baraag, which are very large which is why they were excluded).
Paul Rohr mastodon (AP)
@luca For any given instance, you can ask two different questions:

federation ... How many other instances has this one ever connected with?

defederation ... Who's currently blocking whom?

To the extent that apps + instances on the Fediverse support the relevant Mastodon APIs (many apparently do), it's possible to get decent answers to both.
Paul Rohr mastodon (AP)
@luca To get a rough sense of how widely federated a particular instance is, check out the connections column in instances.social's advanced view:

counter.social 0
gc2.jp 0
pravda.me 31

This sloppy metric counts the number of unique domains *ever* seen from that instance, so it tends to overcount *current* connectivity. (Also, some Masto admins use tootctl to manually prune spam domains after forkbomb attacks, but others don't.)
Paul Rohr mastodon (AP)
@luca As for blocks, see the top 50 lists here.

https://fba.ryona.agency/

For all the dramatic talk about defederation, it seems like most instances don't actually do that to each other. (However, I haven't been able to assess how complete the coverage of this dataset is, so YMMV.)
Drew Mochak mastodon (AP)
@pevohr @luca source code: git.kiwifarms.net/mint/fedi-block-api

Might not wanna use, or link to that tool ma dude.
Drew Mochak mastodon (AP)
@pevohr @luca @fedriversenews@venera.social Possible yes; however I've seen a lot of communities go to a shields up defcon2 situation pretty fast when you start building tools to really dig into who is blocking who and why. I also haven't seen much to compare the degree to which an instance is federated. Though in fairness, I have not really looked!
Alessia Visconti mastodon (AP)
are these active users?

My belief is that most of the users in very large instances are those that created an account to try out the Fediverse/Mastodon but then abandoned it, while smaller instances have a much higher retention rate -- so not strictly true that half the users are in the top 20 instances, but you know, no data to prove it 😬
Does the Pareto principle apply here? are 20% of instances home to 80% of users?
Cochise gotosocial (AP)
there is a bit more concentration. By the graphics, something like 15-80.
Jorge Stolfi mastodon (AP)
@spaetz
Awful article.

Where did the laws with s other than 1 come from?
Jorge Stolfi mastodon (AP)
@spaetz
In case you care, I did a general revision on that article. But there are still plenty of holes and rough spots...
old_tootbrute gotosocial (AP)
woo hoo. i'm in the 0-2 user level. 30% of us are like that? really? cool.
maegul mastodon (AP)
@elias 30% of instances, not users/accounts!

Majority of accounts are on much bigger instances.
old_tootbrute gotosocial (AP)
yes. still impressive. i wonder if these are the user counts on servers due to social factors? or technological factors?

i.e. is it easier to moderate a 2-50 user instance? or is it cheaper to run a 2-50 user instance?
old_tootbrute gotosocial (AP)
i know i would open my server to 0-10 users if i knew how to auto delete old posts on the server.

i'm on gotosocial, so waiting for the software to mature.
maegul mastodon (AP)
@elias
Good question! I got interested in this because of the moderation question. Many believe, and with some justification IMO, that large instances cannot be moderated effectively. I wanted to know how many people out there are in large instances. Turns out it’s probably half the fediverse! But then the other half are on smaller instances, so it seems that a large v small instance fragmentation could easily occur (and may already be occurring).
Mark Prior mastodon (AP)
@elias by my count there are 3,733,962 accounts on the 10 largest instance out of 9,133,502 so about 40%. There are 919,589 in 11-20.
maegul mastodon (AP)
@anant @elias
That’s 30% of instances, not users. See the rest of the thread.
yup was considering instances only. Obviously it cant be 30% users although i would love it but with massive instances hosting massive number of users up top thats unrealistic to expect. Good thread indeed
maegul mastodon (AP)
@gorfram
Don't know exactly, but it is a sub-field of science to some extent. Geoffrey West is a former physicist known for studying such things ... look him up and you might find some resources.

His general finding was that power laws (which seems to accurately describe the relationship between the number of instances and their size) pop up all over the place in biology and cities.
The graph x axis labels are confusing. I'm assuming 0 to 2 is actually 0-1 and the next bin is 2-49?
Joe Winter mastodon (AP)
@gorfram I believe they refer to it as a Zipf distribution.
Gabriele Pollara mastodon (AP)
@m

Very interesting, thank you for explaining. To clarify though, this only applies if you want to be discoverable in Federated timelines.

Presumably your followers will always see any new post from you in their personal timelines, independent of the instance you're on.

One further question though, how about if you use a hashtag someone follows. Will anyone in Fediverse see that, or would they only do so if their instance was connected to yours?

Thank you

@maegul @WestLawns @fediversenews
maegul mastodon (AP)
@dadalo_admin yea absolutely. I haven’t done anything here to account for bots, just basic analysis of the raw data.

How are you determining whether an account is a bot on mastodon.social or other instances? I know some have moderation policies to distinguish them but is that sort of thing typically discoverable?
bookandswordblog mastodon (AP)

Content warning: Mastodon structure

Drew Mochak mastodon (AP)

Content warning: Mastodon structure

bookandswordblog mastodon (AP)

Content warning: Mastodon structure

Drew Mochak mastodon (AP)
@m @gpollara Wow I never knew that's how this worked and I've been here for years! I always struggled to explain why the public timelines looked so different on different instances, that makes so much more sense now. Thank you!
West Lawns mastodon (AP)
@m @gpollara thanks - it is interesting to better understand these things.
Gabriele Pollara mastodon (AP)
@m @WestLawns awesome explanations to all this, thank you. Learnt something new but can also see how this can feel too much to get head around for the average member of public!
Quentin mastodon (AP)
Search is mastodon's biggest weakness. If I search for a hashtag, I would like it to find any post with that hashtag on any mastodon server. But I can't. I know many are aghast at the new onboarding process pushing everyone onto mastodon.social, but it is the only thing which will help new users stay. At least it gives people a chance to find others based on interest. I really wish this was a better experience
West Lawns mastodon (AP)
@Quentin @gpollara @m the absence of Musk (and hate speech) is one of the greatest strengths.
Jupiter Rowland hubzilla (AP)
@Quentin @Gabriele Pollara @m@thias.hellqui.st @maegul @West Lawns The "limitation" of #search to what's known to an instance isn't a political decision. It's a technical limitation that can't be overcome, at least not easily.

Even a search that covers literally 100% of all #Mastodon instances, only Mastodon, as in every last instance that exists the very moment you search for something is technologically impossible. If you search for the hashtag #HelloWorld, and you want to search every last Mastodon instance, you'd also have to be able to search a brand-new, single-user instance that has been started up for the first time two minutes earlier, and where the admin has just dropped a test toot that happens to contain #HelloWorld. At this point, the other Mastodon instances don't even know yet that this new instance even exists.

In order for this to work, every new Mastodon instance must actively push its existence and all of its content to each and every other Mastodon instance that exists at this point. Including one which, in turn, had just been started up two seconds earlier. How is a new Mastodon instance supposed to know about the existence of another instance that's only two seconds older?

But let's take this even further. If you only want to search Mastodon, you'll miss out on a great deal of posts in the #Fediverse. For #MastodonIsNotTheFediverse. The Fediverse isn't only Mastodon. There are many other projects out there, all of which communicate with one another and with Mastodon.

So instead of searching "any Mastodon server", you'd have to search "any Fediverse instance", no matter the project. This would have to include any hypothetical FoundKey instance that has just been spun up for the first time a few seconds ago. And if instances would have to advertise themselves to all other to make Fediverse-wide search possible, this new FoundKey instance would have to know right away about a (streams) instance that was just launched half a second earlier somewhere else, a Mitra instance that's one second older and a /kbin instance that's two seconds older.

I think it's obvious why this will never work. I hope so.
Quentin mastodon (AP)
@WestLawns @gpollara @m
Oh I'm not doubting that, I'm just saying it's harder than it could be to find the *nice* things you might be specifically looking for πŸ˜€
Gabriele Pollara mastodon (AP)
@jupiter_rowland @m interesting explanation, which then implies that basically discoverability on the Fediverse will always be an evolution, rather than instantaneous.

An analogy in my mind is that the Fediverse experience is more like an exploration / expedition through other Mastodon instances & other Fediverse services, along the way picking up people and hashtags of interest that will cast the net wider and wider for you over time.
Jupiter Rowland hubzilla (AP)
@Gabriele Pollara @m@thias.hellqui.st I think that fits it quite well.

People start out on the Mastodon instance they've been invited to, believing that this instance is all of Mastodon. Then they discover decentrality and other instances. Then they discover what a wealth of instances Mastodon has. They may actually move for the first time.

After a few months, they stumble upon parts of the Fediverse that aren't Mastodon, down to unexpectedly communicating with someone from entirely elsewhere. Once they follow @Fediverse News, they're really being enlightened, also because the most prolific poster is a driving force behind the success of CalcKey.
Quentin mastodon (AP)
@jupiter_rowland @m I gathered it was more technical than political - although, the limit of reach to only your friends' friends, does have the potential to form echo chambers. Ok, so on the web, if I get myself a brand new domain and create a brand new website, no-one knows about it yet, BUT, Google's and Bing's and whoever else's little bots will crawl around and find it eventually, and then anyone who searches for my name on those engines will find me. Could someone create a Fediverse search bot that crawls around and finds new servers, or however it works?
Jupiter Rowland hubzilla (AP)
@Quentin @m@thias.hellqui.st Stuff like this has been attempted and promptly been rejected. There are two reasons against it.

One, it has to be centralised. As in, there would have to be exactly one central instance of this search engine. The whole Fediverse which is still decentralised would depend and rely on it. It basically could barely exist without it anymore. Not only Mastodon, but the whole Fediverse, all projects would depend on it.

But decentralised networks depending hard on central infrastructure is dangerous. If that one instance goes down, the whole search goes down. If Jack Dorsey managed to take over the search engine, he would have complete and barely avoidable exclusive control over the whole Fediverse. Ditto Elon Musk, Mark Zuckerberg, Jeff Bezos or whoever.

This is why decentralised networks who have decentrality as a main selling point must never rely on something, anything central.

Also, if you want it to be as powerful as Google, it can't be something created and run by someone in their spare time with zero budget. It would have to be a commercial start-up funded with venture capital which therefore has to find a way to make money because the investors expecte a ROI of 10:1. So it'd either be pay-to-use or spam everyone's timelines with advertising or sell people's data to Google, Facebook and other data brokers without telling them, much less allowing them to opt out.

The other point is heavy resistence coming from the LGBTQIA+ and BIPOC communities who have fled Twitter. They reject Fediverse-wide full-text search altogether, decentralised or not.

Over on Twitter, there are extreme right-wing users who use Twitter's all-encompassing full-text search to find gays, lesbians, trans persons, POC etc. just to harass them. This actually happens. This is why these people have fled Twitter. And this is why they absolutely do NOT want to have this feature in the Fediverse.

For these two reasons, whenever someone tried to introduce some decentralised search engine for the Fediverse, it was promptly and massively fediblocked by countless instances and thus rendered useless.
Quentin mastodon (AP)
@m @jupiter_rowland Hey I just want to find other people who share my weird hobbies and interests - and mastodon makes that next to impossible unless said interest is something with several really large instances devoted entirely to it.
Jupiter Rowland hubzilla (AP)
@Quentin The Fediverse isn't Twitter. There is no secret-sauce algorithm that automagically (allegedly) presents to you what you're interested in on a silver platter.

The key to discoverability are #hashtags.

If you want to be found, use hashtags. If you want to find someone, look for hashtags. Hashtags in your profile, hashtags in your posts. Don't sit there and wait for Mastodon and the whole Fediverse to become like Twitter so you can continue to use it like Twitter.

Unfortunately, exactly that is what many Twitter refugees still do. They aren't used to using hashtags, so they don't. Instead, they expect some hidden magic to bring them together with like-minded people.

But the first step is to use hashtags yourself so that others can find you more easily.

There are also user directories which you can join if you want to and where you can find other users, for example fediverse.info which has the advantage of not only covering Mastodon.

Generally, in the Fediverse, you can't just consume. You have to partake actively. If you want to be found, do something for it.
Lee 🌏 mastodon (AP)
@Quentin @jupiter_rowland
Your server should know every other server in the Fediverse, so that is a lot of servers you are searching for a #tag
(I am pro discoverability BTW)
Quentin mastodon (AP)
That's all well and good, but if I search for a hashtag, I will still only see posts with that, by people on my server or that my server knows - that does not make discoverability easy.
Jupiter Rowland hubzilla (AP)
@Lee 🌏 @Quentin So you say if someone starts up a brand-new FoundKey instance for the first time right now, and two seconds later, someone else starts up a brand-new Hubzilla hub for the first time, these two instances should immediately know each other? And literally all other instances of all other projects in the Fediverse without even a single exception should know both literally instantaneously?

I'm curious about how this should work. Especially since the Hubzilla hub doesn't even have a single ActivityPub connector running yet (ActivityPub is optional per hub and then per channel).
Lee 🌏 mastodon (AP)
@jupiter_rowland
Hey Juipter
I'm absolutely not the expert on this. I've been asking lots of questions and finding my way.
How Fediverse Servers find each other is something I've been trying to learn. They obviously do, but how quickly and how successfully remains a mystery to me. I plan to install Activity Pub, WordPress plugin, and I am most curious to how people here on Mastodon will find my #tags.
Kevin Davidson mastodon (AP)
@MrLee @jupiter_rowland It takes two things. Boosts and hashtags.
The boosts will get your posts into the timelines of whoever follows the person boosting your initial WP post. It also gets it into the Federated timeline of the instances all those followers are on.
Kevin Davidson mastodon (AP)
@MrLee @jupiter_rowland It’s possible (but unlikely) that someone will stumble across the post by browsing the federated timeline, but on most instances that’s just too noisy and is ignored.
Hashtags act as a filter, searching through the federated timeline (they can only match posts that have already made it to your instance through other means - boosting).
Kevin Davidson mastodon (AP)
@MrLee @jupiter_rowland So you want to get your post boosted by someone well connected with followers on lots of different instances and have a strong hashtag game.
Kevin Davidson mastodon (AP)
@MrLee @jupiter_rowland For simplicity, I’ve ignored relays. They may help your post propagate, but won’t actually get it in front of anyone’s face. It will help people searching for the webfinger address of your blog as their instance may at least have heard of it and cached it.
@MetalSamurai
If you're subscribing a #hashtag and your server sees a tagged post through a #relay, it will land in your #feed.

@MrLee @jupiter_rowland

This website uses cookies to recognize revisiting and logged in users. You accept the usage of these cookies by continue browsing this website.