Ticket #225 (closed defect: duplicate)
regexp engine broken when a string contains non ascii characters
| Reported by: | mattaimonetti@… | Owned by: | lsansonetti@… |
|---|---|---|---|
| Priority: | critical | Milestone: | MacRuby 0.4 |
| Component: | MacRuby | Keywords: | regexp, bug |
| Cc: |
Description
Here is a sample code to reproduce the problem:
html = %{<p><a href="http://www.flickr.com/people/jeanelietrujillo/">jeanelietrujillo</a> posted a photo:</p>
<p><a href="http://www.flickr.com/photos/jeanelietrujillo/2211862262/" title="Galgani Décoration"><img src="http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg" width="240" height="240" alt="Galgani Décoration" /></a></p>}
html.scan(/<img\s+src="(.+?)"/)[0][0]
ruby 1.9 returns:
=> "http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg"
macruby returns:
=> "ttp://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg\""
Now let's try to remove the é and replace it by a e:
html = %{<p><a href="http://www.flickr.com/people/jeanelietrujillo/">jeanelietrujillo</a> posted a photo:</p>
<p><a href="http://www.flickr.com/photos/jeanelietrujillo/2211862262/" title="Galgani Decoration"><img src="http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg" width="240" height="240" alt="Galgani Décoration" /></a></p>}
html.scan(/<img\s+src="(.+?)"/)[0][0]
MacRuby now returns:
=> "http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg"
My guess is that the unicode characters mess up the the count to extract the matched string resulting in a substring starting one character too early.
To prove my hypothesis here is another sample, this time with 2 "é" characters:
html = %{<p><a href="http://www.flickr.com/people/jeanelietrujillo/">jeanelietrujillo</a>a posté une photo:</p>
<p><a href="http://www.flickr.com/photos/jeanelietrujillo/2211862262/" title="Galgani Décoration"><img src="http://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg" width="240" height="240" alt="Galgani Décoration" /></a></p>}
html.scan(/<img\s+src="(.+?)"/)[0][0]
MacRuby returns:
=> "tp://farm3.static.flickr.com/2262/2211862262_2f08c343a3_m.jpg\" "
Change History
Note: See
TracTickets for help on using
tickets.

