knowing is obsolete :: regular expression generator
What is this site?
I hate regular expressions. I come across them fairly regularly in my job and every time I think 'not again'. I find the syntax of regular expressions impossible to remember - and the surrounding code in each language is completely different too. Each language has its own distinct way of doing it. Escaping characters is difficult. All-in-all every time I need to use a regular expression, I have to steel myself for an hour or two of referencing manuals and wondering why it isn't working yet. So one day when I was panning around for something to do I imagined the way regular expressions should be..
So what does txt2re do?
This system acts as a regular expression generator. Instead of trying to build the regular expression, you start off with the string that you want to search. You paste this into the site, click submit and the site finds recognisable patterns in your string. You then select the patterns that you are interested in and it writes a fully fledged program that extracts those patterns from that string. You then copy the program into your editor or IDE and play with it to integrate it into your program.
Thats appalling - where's the subtlety and art of crafting a beautiful regular expression in that?
There is none.
What about the readability of the generated code?
Expressions that are generated are always in the same form - if you are looking for the second integer from the left, this system will output a regex that finds 1. non integer text I call 'filler' 2. the first integer 3. non integer text 4. the second integer. Only the second integer will be bracketed for extraction. In the generated code, you can see it building this expression. See http://txt2re.com/index.php3?s=42%2043&2 to see what I mean.
How is this better than xyz tool?
All of the tools I have looked at start with the regular expression, and provide a graphical interface instead of a text based interface to allow you to build it. I have found using these tools to be just as difficult as typing the regular expression into an editor. I've never seen the big advantage. Txt2reg on the other hand takes a fundementally different approach - it starts with the string to be searched.
What if the program that the site generates isn't quite right for me - EG patterns that your site is finding in my string don't include an Armenian style date?
I cant include all possible patterns in our database and have tried to steer clear of patterns that will not be commonly used. If you think that your particular pattern warrants inclusion, let me know in the feedback. Even with an Armenian style date remember that the sub-elements will be picked up individually - the year, the month and the day. You can match on each of these by clicking on each.
I still don't get it - give me an example.
Lets say you are reading lines from a file in Java. You want to extract a filename, an amount and a date from each line:
P DOYLE.,C WILLS.,
My old way (60 minutes):
Find an example of how Java Regex works.
Try a simple expression to extract the yellow windows file field.
Work out that the dots in the filename need to be escaped in the regular expression string.
Work out that the backslashes in the path need to be escaped also.
Work out that the backslashes need to be doubly escaped - once for Java and once for the regular expressions.
Match the blue amount column because on its left hand side is an integer and on its right a date.
Match the green date column because on its left is the amount.
Work out that the forward slashes need to be escaped in the date expression.
Test, test, test.
With txt2re (6 minutes):
Paste a line from the file into txt2re.
Click on the 'winpath' pattern, the 2nd 'float' pattern and the second 'ddmmyyyy'.
Click on Java.
Integrate logic into program.
You are trying to match backslash in a php program and are having some difficulty. You paste a backslash into the text area, click submit select the backslash and select php. You then have a complete php program that correctly matches a backslash. Copy and paste!
How does the generated program work?
Say your string contains the number 168. This is matched as both an integer and as the number 168. The system displays both. If you click on the integer, the system determines how many integers are to the left of the 168 you clicked on. It then outputs a program that matches against each of these, and then performs an extraction on the one you clicked on. If you had clicked on literal 168, it would find out how many of these are to the left of the one you clicked on.. and so on
What about Unicode?
The state of Unicode development in each of the engines supported here is very variable. Some engines don't support it at all, some have poor or error-prone support, and some support it very well. Until this situation improves it won't be supported here.
The interface is very confusing!
This system wasn't designed as for newbies! - it was designed for programmers who know what they are doing but couldnt be bothered doing it. If you understand the problem that is being solved, the interface is fine. If you don't its gibberish. Do your career a favor and learn how regular expressions work - you will meet problems that this site does not solve!
The interface could be better though.
I realise that a Web2 version of this tool would look great but I don't care. I spend my professional life caring hugely how things look but not here. A half decent programmer should have no problem with it. I am not trying to win a design award or to help weak students to get through regex projects...
Why is this free?
Its free because I have been helped in my career so much by the programmers who generated free systems like linux, apache, php and mysql - this is the only free labour that I have ever given back to the community.
Any other random neuron firing?
It was interesting to see how regular expressions work in each of the supported languages - I hadn't used most of them before. I was quite surprised by just how similar the .net languages are to each other (you probably knew that already). I thought that Pythons use of significant whitespace is nice - programming aesthetics become meaningful!