Regular expressions (regex) for a variety of situations.
^at.*
remove all lines that begin with "at"
.*exception.*.
remove all lines that contain the string "exception"
target="basefrm" id="itemTextLink[0-9]+"
$1
remove the number sequence from the string
^ (.*)$
$1
return capture group without beginning space
^,(.*)$
$1
return capture group without beginning comma
^(.*),$
$1
return capture group without ending comma
^…(.*)
$1
return capture group without beginning three characters
^(.*)\?.*
$1
return capture group without everything after the ?
^(.*)#[0-9].*
$1
return capture group without number(s) at end of a string
^(.*)#[0-9]+;#.*
$1
return capture group without hash, number(s), semicolon and hash pattern in string
Examples
This regex pattern contains two replacement capture groups ([^/?])$|/?([?].*)
The ( )
is refered to as backreferences $1 and $2. It also contains two match groups [ ]
. The replacement works even in case one capture group is empty because a non-participating capture group is filled with an empty string after a match. ([^/?])$
means match and capture into Group 1 any character but a / or ? at the end of the string. The |
means or. The /?
means an optional (1 or 0) forward slash. And ([?].+)
means match and capture into Group 2 a literal ? followed with 1+ characters other than a newline.
Return a capture group that removes a bunch of HTML empty strings
.*(<body |body>|<div |div>|head>|html>|script>|style>|<table |table>|title>|<tr |tr>|(A {|BODY {|TD {|content=|initializeDocument|meta name)|<td valign=|</a>|</td>|.*src="Support.*|alt="mauyong">|.*alt=.*/>$|.*href='javascript:clickOnNode.*alt="|" target="basefrm").* $1
Solr Query
Find and escape all special characters in a user query before it is sent to Solr. Queries may otherwise return zero or unexpected results, for example:
- wellness: my
- Heritage Metrics Model - 7380166 [1]
// Term is search query string
term = this.replaceSpecialCharacters(term)
// Find and escape all special characters in Solr query string
public static replaceSpecialCharacters(term: string) {
// Create a map of special characters with their replacements
let pairMap: Object = {
'[': '\\[',
']': '\\]',
'(': '\\(',
')': '\\)',
'{': '\\{',
'}': '\\}',
'*': '\\*',
'+': '\\+',
'?': '\\?',
'|': '\\|',
'^': '\\^',
'$': '\\$',
'\\': '\\\\',
'-': '\\-',
'&': '\\&',
'!': '\\!',
'~': '\\~',
':': '\\:',
';': '\\;'
}
let findMap = Object.keys(pairMap)
let replaceMap = Object.values(pairMap)
let cleanRegexArray: string[] = []
let matchMap = {}
for (let i = 0, len = findMap.length; i < len; i++) {
// Create an array of regex escaped characters using
// a character class inside a capture group
cleanRegexArray.push(findMap[i].replace(/([\[\](){}*+?|^$.\\])/g, '\\$1'))
// Create an object with proper key:value property pairs for matching
matchMap[findMap[i]] = replaceMap[i]
}
// Create regex-ready OR string of characters to find
let cleanRegexString = cleanRegexArray.join('|')
// Replace characters in term with characters matching map key
term = term.replace(
new RegExp(cleanRegexString, 'g'),
function (matchKey) {
return matchMap[matchKey]
}
)
return term
}
The the character class [ ]
inside the capture group ( )
contains all the characters that need to be escaped before they can be used in a regex expression. These include
- Brackets:
[]
- Parentheses:
()
- Curly braces:
{}
- Operators:
* + ? |
- Anchors:
^ $
- Others:
. \
Escaping guidelines for items in the character class:
- Escape
[ ]
and\
literals in character class - Avoid
^
at beginning - Avoid
-
at beginning or end - Escape the capture group replacements
Fusion JavaScript Pipeline
Escape special characters in a Solr Partial Update Indexer ID field
function(doc) {
if (doc.getId() !== null) {
// get the ID
var new_id = doc.getId();
// escape dashes
new_id = id.replace(/-/g,"\-");
// change the id field
doc.setField(id, new_id);
}
return doc;
}
;
Regex Field Extraction Index Stage
- https://doc.lucidworks.com/fusion-server/4.2/reference-guides/index-pipeline-stages/regular-expression-extractor-index-stage.html
Lucidworks Fusion Javascript Index Stage
- https://doc.lucidworks.com/fusion-server/4.2/reference-guides/index-pipeline-stages/javascript-index-stage.html
Escaping Special Characters in Solr
- https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#TheStandardQueryParser-EscapingSpecialCharacters
Resources
Regex Testers
Regex 101
Elasticsearch
GNU Regex
Java Regex
Urlrewrite for Java webservers