I was running a crawl for a friend’s site with my favorite SEO tool screaming frog and got the dreaded ‘connection refused’. My first thought was oh no this site is not indexable and the search engines probably dont even know it exists. Which is never a comforting feeling.
Fortunately, I found a work around within screaming frog for crawling websites that may block access to bots other than major search engines. Generally, other sites dont want bots visiting and crawling their websites since it can bloat their analytics. There is nothing worse than opening up your Google analytics account and seeing a spike in traffic for the month just to find out its all bot traffic coming into your website.
The work around is surprisingly simple and a little devious. You want to go into the configuration setting of Screaming frog and click change the user agent. I would recommend changing the user agent to Google bot so you can properly crawl and access the site.
I would also recommend running a simple query if you encounter this issue and see if the site is currently indexed within Google. You can do this by simply doing a query like “Sitedomain.com” remember to leave off the ‘www.’ and ‘https://’.
Unfortunately, when I tried this there were no pages indexed within Google at this time. However, as of writing this I see that two pages have already been indexed upon submitting the sitemap within search console. If this happens you want to submit a sitemap immediately within Search Console and wait for Google bot to come around and index the site properly. If you are worried the robots.txt might be blocking some pages just go to the robots.txt tester within search console.
Input a url you are concerned about and see if your current robots.txt file is blocking anything that would keep Google bot from rendering the page properly. Also be sure to check and make sure there arent any unwanted ‘meta no index’ tags coded within the site to keep search engines from indexing it as well.
Good luck and have fun optimizing your site!