This Forum has been archived there is no more new posts or threads ... use this link to report any abusive content
==> Report abusive content in this page <==
Post Reply 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Javascript Regular Expression Odd Behavior?
10-01-2012, 01:17 AM
Post: #1
Javascript Regular Expression Odd Behavior?
I'm trying to parse Twitter-style hashtags from a string in Javascript (that is, tags of the form #thisIsAHashTag). The string is stored in the variable tweet.text and the regular expression replacement it supposed to wrap all hashtags in a link that passes the hashtag data to a script called Search().

I figured a regular expression would be the easiest way to go, so I came up with this:

(Y!A cuts off long lines, so I had to put it in a text file for you): http://dl.dropbox.com/u/11662651/HashTag_RegExp.txt

I can break it down a bit to explain: the first parenthesized expression should capture the start of the string or a space; following this, it should capture the content of the #hashTagText, which is only letters, numbers, an _ characters; and the last expression just ensures there's a non-hashtag character or end-of-string after the hashtag.

This mostly works, but when multiple hashtags come in a row, it only wraps every other hashtag in a link. For example, if I pass it a string that contains this:

This #is #not #a #working #hashtag string.

Then #is, #a, and #hashtag will correctly be wrapped, but #not and #working will remain untouched.

Can someone help me spot the problem or come up with a better RegEx to wrap EVERY hashtag correctly?

Thanks Smile .
@Ratchetr: Thanks! It works, and it makes sense Smile . I don't know if there's any way to accept your answer (I don't see one) or if it has to go to a vote, but thanks anyway! Big Grin

Ads

Find all posts by this user
Quote this message in a reply
10-01-2012, 01:25 AM
Post: #2
 
I believe the problem is here:
(^|\s)

Each match is created by finding the space between tags. (Since space matches [^A-Za-z0-9_]).

But that consumes the space!. When it loops back around to find the next match, the next character is going to be the #, which doesn't match (^|\s). So it keeps scanning until it finds a space, which skips over the next item.

I believe if you change (^|\s) to (\s)* it will work for your example. But I'm not positive it catches all scenarios, you'll need to test a bit.

Ads

Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)