i'm trying to extract links from an html document using HtmlEditorKit(),but it seems to be so fragile that when page is not well-formed, it throwsexceptions,is there a way to make this more ignorant? or any other good way?, maybe aregex?
Do try a regex . A quick google and I found this fromhttp://sastools.com/b2/index.php?m=2002&w=46(?:[hH][rR][eE][fF]\s*=)(?:[\s""']*)(?!#|[Mm]ailto|[lL]ocation.|[jJ]avascript|.*css|.*this\.)(.*?)(?:[\s>""'])