Most of the pages were getting excluded as "soft 404" or "crawl anomaly". Even though none are "not found" type of pages or "low content" pages. And there weren't any server outages for the crawl anomaly.
Google seems to take pages with discussion threads full of unique content and declare them as "soft 404" or "crawl anomaly".
Then, I added all my content urls (threads/posts) to sitemaps and submitted them. Then, I used GWT to to submit validation for those soft 404/crawl anomaly pages. And now pages either get back into "failed" mode as "soft 404" or "crawl anomaly", or simply increase the "excluded" count.
So now I have 140k pages in index. 908k pages excluded (two months ago it was 750k pages excluded and 231k page in index). Of the excluded pages, 616k "discovered currently not indexed" and 214k "crawled currently not indexed".
There are other pages like, 53k ?page with redirect" or 15k "blocked by robots.txt". I don't count all those auxiliary pages, like user profiles, error pages, missing posts, etc. I only want to get the content-bearing pages indexed.
I checked all technical aspects. The server is working. I moved hosts recently. And the legacy crawl stats report shows time spent downloading went down from 150ms on my old host down to 30ms on the new host. It fetches about 30k pages per day (according to that same report). But it stopped including content. And most of my content has been removed from the index.
In the past, I could pick any thread, copy a sentence and search for it in the quotes and I would see my page in the serps. But now, only as tiny part of the forum content is indexed.
Any ideas on what to try to do about this?
[link] [comments]
from Search Engine Optimization: The Latest SEO News https://www.reddit.com/r/SEO/comments/dpga5o/i_have_an_established_forum_with_around_940k/>
No comments:
Post a Comment