Malcolm 'Max' DeRungs

Regular expressions

Regular expressions (regex) for a variety of situations.

^at.* remove all lines that begin with "at"
.*exception.*. remove all lines that contain the string "exception"
target="basefrm" id="itemTextLink[0-9]+" $1 remove the number sequence from the string

^ (.*)$ $1 return capture group without beginning space
^,(.*)$ $1 return capture group without beginning comma
^(.*),$ $1 return capture group without ending comma
^…(.*) $1 return capture group without beginning three characters
^(.*)\?.* $1 return capture group without everything after the ?
^(.*)#[0-9].* $1 return capture group without number(s) at end of a string
^(.*)#[0-9]+;#.* $1 return capture group without hash, number(s), semicolon and hash pattern in string

Examples

This regex pattern contains two replacement capture groups ([^/?])$|/?([?].*)
The ( ) is refered to as backreferences $1 and $2. It also contains two match groups [ ]. The replacement works even in case one capture group is empty because a non-participating capture group is filled with an empty string after a match. ([^/?])$ means match and capture into Group 1 any character but a / or ? at the end of the string. The | means or. The /? means an optional (1 or 0) forward slash. And ([?].+) means match and capture into Group 2 a literal ? followed with 1+ characters other than a newline.

Return a capture group that removes a bunch of HTML empty strings

.*(<body |body>|<div |div>|head>|html>|script>|style>|<table |table>|title>|<tr |tr>|(A {|BODY {|TD {|content=|initializeDocument|meta name)|<td valign=|</a>|</td>|.*src="Support.*|alt="mauyong">|.*alt=.*/>$|.*href='javascript:clickOnNode.*alt="|" target="basefrm").* $1

Solr Query

Find and escape all special characters in a user query before it is sent to Solr. Queries that might return zero or unexpected results without escaping:

  • wellness: my
  • Heritage Metrics Model - 7380166 [1]
let map = {
    '\\': '\\\\',
    '+': '\\+',
    '-': '\\-',
    '&': '\\&',
    '|': '\\|',
    '!': '\\!',
    '(': '\\(',
    ')': '\\)',
    '{': '\\{',
    '}': '\\}',
    '[': '\\[',
    ']': '\\]',
    '^': '\\^',
    '~': '\\~',
    '*': '\\*',
    '?': '\\?',
    ':': '\\:',
    '"': '\\"',
    ';': '\\;',
}

query = this.replaceBulk(query, map)

public static replaceBulk(qry: string, obj: Object) {
    let findArray = Object.keys(obj)
    let replaceArray = Object.values(obj)
    let regex = []
    let map = {}
    for (let i = 0; i < findArray.length; i++) {
        regex.push(findArray[i].replace(/([-[\]{}()*+?.\\\^$|#,])/g, '\\$1'))
        map[findArray[i]] = replaceArray[i]
    }
    let RE = regex.join('|')
    qry = qry.replace(new RegExp(RE, 'g'), function (matched) {
        return map[matched]
    })
    return qry
}

Fusion JavaScript Pipeline

Escape special characters in a Solr Partial Update Indexer ID field

function(doc) {
	if (doc.getId() !== null) {

		// get the ID
		var new_id = doc.getId();

		// escape dashes
		new_id = id.replace(/-/g,"\-");

		// change the id field
		doc.setField(id, new_id);
	}
	return doc;
}
;

Regex Field Extraction Index Stage

Lucidworks Fusion Javascript Index Stage

Escaping Special Characters in Solr

Resources

Regex Testers

Regex 101

Elasticsearch

GNU Regex

Java Regex

Urlrewrite for Java webservers