Model collapse is a real phenomenon where blindly recycling machine generated content for training new models degrades the model quality.
Why isn't this a problem in practice? Because we shouldn't use raw data from the net anyhow for training LLMs or other generalist models.
They should be trained with instruction following patterns anyhow, so the data will need transformation in any case. In this transformation step you can also add quality improvement and refinement as well, as you're doing LLM transformation anyway.
Quality improvement and refinement involves ranking, filtering, combining and mutating the raw data to make it of higher quality than the data that comes in.
In principle you're adding compute and intelligence to raw data, which allows analysing trustworthiness, implications, and all meta-level aspects of this raw data, in synthesis with everything already known.
Having machine generated content in the raw data won't have a degrading effect if you're using your existing models to bootstrap this data refinement, so that you are only extracting the value out of this data, and not the noise which degrades performance.
It's not like we have some golden human sourced data polluted by bad machine generated data. Both human sourced data and machine generated data have both valuable aspects and degrading noise baked in. Luckily we have tools to extract the value out of both.