T.R | Title | User | Personal Name | Date | Lines |
---|
838.1 | | COOKIE::WAHL | Dave Wahl: Database Systems AD, CX01-2/N22 | Sat Mar 05 1988 15:57 | 20 |
| As long as you stick with a regular expression description of the
pattern, the effort is measured by the number of times a recognizable
prefix is seen in the string to be matched. If we restrict the matches
to just * and % (and I'm not sure % throws in any interesting pathological
cases) then for patterns of the form
*ab
where a and b are strings of any length, the lexeme
[a^n]b
(where a^n is ascii for "a to the n", indicating n repeats of a) indicates that
effort n will be expended by the finite automaton which has to recognize
the string.
Seems like recognizers of patterns like *ab*bc*cd will have linear effort
of n+m+p matching lexemes of the form [a^n][b^m][c^n]d.
Dave
|
838.2 | it's recursive! | COMICS::DEMORGAN | Richard De Morgan, UK CSC/CS | Mon Mar 07 1988 07:00 | 2 |
| ... and (if it hasn't changed) the matching algorithm is recursive
(and very short - I cribbed it for a tool I built)
|
838.3 | | CLT::GILBERT | Builder | Mon Mar 07 1988 11:54 | 7 |
| >Seems like recognizers of patterns like *ab*bc*cd will have linear effort
To match such a pattern, find the "cd" at the end of the string, then
work with the remainder (of the pattern and the string). Find the
first "ab", then find the first "bc" following that. If any of these
'finds' fail, there is no match. Recursion and backtracking are not
needed for the "*" wildcard character.
|
838.4 | but... | AKQJ10::YARBROUGH | Why is computing so labor intensive? | Mon Mar 07 1988 12:09 | 7 |
| >
> ..... If any of these
> 'finds' fail, there is no match. Recursion and backtracking are not
> needed for the "*" wildcard character.
That's true if you want only to find the first match. Finding ALL the
matches is harder.
|
838.5 | | BEING::POSTPISCHIL | Always mount a scratch monkey. | Mon Mar 07 1988 13:02 | 9 |
| All regular expressions correspond to a finite state parser. Just
build the machine (which I believe is commonly done in Unix, at least
by the grep utilities if not the editors) and run it: Read a
character, change state, repeat. If you are in a terminal state when
there are no more characters to read, the string is accepted. This is
clearly linear on the length of string, after the machine is built.
-- edp
|
838.6 | | COOKIE::WAHL | Dave Wahl: Database Systems AD, CX01-2/N22 | Mon Mar 07 1988 14:52 | 17 |
| re: .-1
> This is
> clearly linear on the length of string, after the machine is built.
True. The base note seems to define "effort" as the number of recognized
partial matches which is not quite the same as the time complexity of the
FA recognizer.
re: switching the scan direction (a couple of replies ago)
The switch to a right-to-left scan doesn't buy you anything in the general
case, since you coul be required to recognize the inverse of the pattern
I proposed. It's still a regular-expression recognizer which should operate
without recursion on one-character lookahead.
Dave
|
838.7 | what I was trying to get at | VIDEO::OSMAN | type video::user$7:[osman]eric.vt240 | Mon Mar 07 1988 19:27 | 46 |
| What I was interested in finding out, is if we could present the
pattern matcher with some sort of pattern such as
\s1\s2\s3...\sn\ (the "\" stands for some combo of "%" and "*"
characters, the s1,s2,s3 stand for strings
without "%" or "*" in them)
and a nasty string S
such that the matcher is forced to go into gyrations such as:
o.k. Here's something that satisfies s1.
here's something to satisfy s2.
here's something to satisfy s3.
Whoops, nothing left to satisfy s4.
Hmmm. Oh, Here's another match for s3.
Still nothing left for s4.
Hmmm, oh here's yet another match for s3.
Still nothing for s4.
Nothing left for s3. Oh, here's another match for s2.
Here's something for s3.
Drat, nothing for s4.
o.k. here's something else for s3.
Still nothing for s4.
Nothing else for s3. Oh, here's another s2.
etc. etc. etc.
In other words, design a string and pattern that keeps sending
the pattern matcher down the garden path such that it keeps having to backup
and try another choice for previous matches.
/Eric
|
838.8 | | CLT::GILBERT | Builder | Mon Mar 07 1988 20:12 | 10 |
| re: .6
>re: switching the scan direction (a couple of replies ago)
That was apparently mine, in discussing the pattern *ab*bc*cd.
Given a pattern <s1>*<s2>*...*<sn>, where the <si> are substrings
that don't contain '*', you want to first match <s1> and <sn>,
and then the rest of the matching can be done in 'one pass'. You
don't have to keep redetermining that the string to be matched
doesn't end with <sn>.
|