Friday, 26 January 2018

Non greedy (reluctant) regex matching in sed?



I'm trying to use sed to clean up lines of URLs to extract just the domain.



So from:



http://www.suepearson.co.uk/product/174/71/3816/


I want:




http://www.suepearson.co.uk/


(either with or without the trailing slash, it doesn't matter)



I have tried:



 sed 's|\(http:\/\/.*?\/\).*|\1|'



and (escaping the non-greedy quantifier)



sed 's|\(http:\/\/.*\?\/\).*|\1|'


but I can not seem to get the non-greedy quantifier (?) to work, so it always ends up matching the whole string.


Answer



Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:




perl -pe 's|(http://.*?/).*|\1|'

No comments:

Post a Comment

casting - Why wasn't Tobey Maguire in The Amazing Spider-Man? - Movies & TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...