Page 1 of 1

Solved: How to check urls in linux

Posted: Tue Jan 31, 2012 3:23 pm
by mister_v
Hi,

I want to check my sites for broken or missing urls.
Does anyone know an easy and fast way to check the links of a website.

I want to use it in linux (Ubuntu) and create a script with it,
so I can run it automaticly on a regular basis.

Thanks,

Re: How to check urls in linux

Posted: Wed Feb 01, 2012 9:18 am
by chris
You can use wget.
It is not the best tool but it is probably already installed on your system.

Code: Select all

wget -r -nd --spider -o links.txt -np -p http://www.sitetocheck.com
It shows the results in links.txt.
just search for the 404 errors.

A better tool is linkchecker (http://linkchecker.sourceforge.net)
You can install it on ubuntu/kubuntu with:

Code: Select all

sudo aptitude install linkchecker
You can check for broken links.
But it can also validate your HTML and CSS,
It can even scan for viruses on you site with clamAV.

There is also GUI client for linkchecker.

Re: How to check urls in linux

Posted: Wed Feb 01, 2012 7:57 pm
by mister_v
Thanks,

I used linkchecker it works really great.

But it lists everything, I only want the errors.
It also checks the amazon urls and for some reason they also give errors,
I don't want them.

Re: How to check urls in linux

Posted: Thu Feb 02, 2012 2:17 pm
by chris
Jus use grep to get the 404 error out:

Code: Select all

less links.txt | grep -B 4 '404 Not Found' 
-B 4 tells grep to also return the 4 lines before each match.

You don't want linkchecker to test the amazon URLs,
you can exclude them:

Code: Select all

linkchecker --ignore-url="amazon" http://www.sitetotest.com > links.txt

Re: How to check urls in linux

Posted: Mon Feb 06, 2012 7:18 pm
by mister_v
Many Thanks

I got what I needed.