I am fairly new to java, but have a fair amount of experience with
PHP and Perl RegEx.
I am writting a program to parse an RTF file, and have developed the
following RegEx (with minor modifications)
insrsid\d*([^{}]*)?\\\\cell \\}
which works fine with preg_match (PHP parser, using Perl rules). And
so knowing that the Java parser is also build off of the Perl parser,
I just plugged the expression. But I found out that it doesn't
work. I did find that if I change it to
^.*?insrsid\d*([^{}]*)?\\\\cell \\}.*$
(adding the line begining and end qualifiers) that it will return
true, but the returned text (VAR.substring(VAR_match.start(),
VAR_match.end())) that it returns almost the entire file, and not the
small selection that it is supposed to.
for example, assume I have the below file
--------------------------------------
{\b\f2\fs20\insrsid4223016 TEXT 1\cell }\pard \ql \li0\ri0
\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\f2
\fs20\insrsid4223016 \trowd \irow0\irowband0\ts11\trleft0\trftsWidth1
\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrnone \clbrdrb\brdrs\brdrw15
\clbrdrr\brdrnone \cltxlrtb\clftsWidth3\clwWidth18610\clshdrawnil
\cellx18610\row }\trowd \irow1\irowband1\ts11\trleft0\trftsWidth1
\clvertalt\clbrdrt\brdrs\brdrw15 \clbrdrl\brdrnone \clbrdrb\brdrnone
\clbrdrr\brdrnone \cltxlrtb\clftsWidth3\clwWidth9370\clshdrawnil
\cellx9370\clvertalt\clbrdrt\brdrs\brdrw15 \clbrdrl\brdrnone
\clbrdrb\brdrnone \clbrdrr\brdrnone \cltxlrtb\clftsWidth3\clwWidth9240
\clshdrawnil \cellx18610\pard \ql \li0\ri0\sb100
\widctlpar\intbl\faauto\adjustright\rin0\lin0 {\b\f2\fs20
\insrsid4223016 TEXT 2\cell }
--------------------------------------
if run with my PHP scripts it would return 2 entries
-----
TEXT 1
TEXT 2
-----
But when I run it with my Java program I get this
-----
TEXT 1\cell }\pard \ql \li0\ri0
\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\f2
\fs20\insrsid4223016 \trowd \irow0\irowband0\ts11\trleft0\trftsWidth1
\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrnone \clbrdrb\brdrs\brdrw15
\clbrdrr\brdrnone \cltxlrtb\clftsWidth3\clwWidth18610\clshdrawnil
\cellx18610\row }\trowd \irow1\irowband1\ts11\trleft0\trftsWidth1
\clvertalt\clbrdrt\brdrs\brdrw15 \clbrdrl\brdrnone \clbrdrb\brdrnone
\clbrdrr\brdrnone \cltxlrtb\clftsWidth3\clwWidth9370\clshdrawnil
\cellx9370\clvertalt\clbrdrt\brdrs\brdrw15 \clbrdrl\brdrnone
\clbrdrb\brdrnone \clbrdrr\brdrnone \cltxlrtb\clftsWidth3\clwWidth9240
\clshdrawnil \cellx18610\pard \ql \li0\ri0\sb100
\widctlpar\intbl\faauto\adjustright\rin0\lin0 {\b\f2\fs20
\insrsid4223016 TEXT 2
-----
Can someone tell me what I am doing wrong, and why the Java RegEx
parser works in such a perculiar way?