I want to check my sites for broken or missing urls.
Does anyone know an easy and fast way to check the links of a website.
I want to use it in linux (Ubuntu) and create a script with it,
so I can run it automaticly on a regular basis.
Solved: How to check urls in linux
Solved: How to check urls in linux
Last edited by mister_v on Mon Feb 06, 2012 7:19 pm, edited 1 time in total.
Re: How to check urls in linux
You can use wget.
It is not the best tool but it is probably already installed on your system.
It shows the results in links.txt.
just search for the 404 errors.
A better tool is linkchecker (http://linkchecker.sourceforge.net)
You can install it on ubuntu/kubuntu with:
You can check for broken links.
But it can also validate your HTML and CSS,
It can even scan for viruses on you site with clamAV.
There is also GUI client for linkchecker.
It is not the best tool but it is probably already installed on your system.
Code: Select all
wget -r -nd --spider -o links.txt -np -p http://www.sitetocheck.com
just search for the 404 errors.
A better tool is linkchecker (http://linkchecker.sourceforge.net)
You can install it on ubuntu/kubuntu with:
Code: Select all
sudo aptitude install linkchecker
But it can also validate your HTML and CSS,
It can even scan for viruses on you site with clamAV.
There is also GUI client for linkchecker.
Re: How to check urls in linux
I used linkchecker it works really great.
But it lists everything, I only want the errors.
It also checks the amazon urls and for some reason they also give errors,
I don't want them.
I used linkchecker it works really great.
But it lists everything, I only want the errors.
It also checks the amazon urls and for some reason they also give errors,
I don't want them.
Re: How to check urls in linux
Jus use grep to get the 404 error out:
-B 4 tells grep to also return the 4 lines before each match.
You don't want linkchecker to test the amazon URLs,
you can exclude them:
Code: Select all
less links.txt | grep -B 4 '404 Not Found'
You don't want linkchecker to test the amazon URLs,
you can exclude them:
Code: Select all
linkchecker --ignore-url="amazon" http://www.sitetotest.com > links.txt
Re: How to check urls in linux
Many Thanks
I got what I needed.
I got what I needed.