grep all http urls - grep url from file - grep url regex
Scenario
I have one sitemap.xml file that is too big and in this sitemap.xml file have http/https url something like this
Challenges:
We don't know how many url's exist on this sitemap.xml file and manually activity will take a lot of time also could be change of human error.
Solution:
Grep command can help me on this situation, ( Let me comment if we can do this via any other option )
Here, I have window Operation System with installed mobaxterm ssh manage and I'm using local terminal on it so let's try
1 - I have downloaded the sitemap.xml file
curl https://www.linuxtopic.com/sitemap.xml -o /tmp/sitemap.xml
2 - Once download, use below grep command to print all the http URLs
grep -o -E "https?://[][[:alnum:]._~:/?#@&'()*+,;%=-]+" /tmp/sitemap.xml
OR
grep -Eo "(http|https)://[a-zA-Z0-9./?=_%:-]*" /tmp/sitemap.xml
I hope this topic gave you all the information you needed. If you have any further questions or would like more detailed directions feel free to contact us using any of the following sources.We look forward to talking to you.