Regular expressions
03 March, 2019
A collection of notes on Regular Expressions in a variety of situations.
Notepad++
// Remove beginning comma
^,(.*)$
$1
// Remove beginning space
^ (.*)$
$1
// Remove beginning three characters
^…(.*)
/1
// Remove ending comma
^(.*),$
$1
// Remove everything after "?" in URL
^(.*)\?.*
$1
// Remove number(s) from end of URL
^(.*)#[0-9].*
$1
// Remove number sequence in string
target="basefrm" id="itemTextLink[0-9]+"
$1
// Remove "#31;#" from inside of string
#[0-9]+;#
$1
// Remove all lines that begin with "at" in a log file
^at.*
// Remove all lines that contain "mapMappableContainerException"
.*mapMappableContainerException.*.
URLs
Add regex pattern from http://stackoverflow.com/questions/36444626/add-a-trailing-slash-to-urls-using-tuckeys-url-rewrite-filterand https://regex101.com/r/rF7nF6/2
Log Files
/opt/tomcat/logs/
catalina.out
Remove Smartlogic errors
.*CLASSES.*
.*parent not present.*
.*HierarchicalFacetProcessor.*
.*Unrecognized attribute.*
Remove error details
.*at sun.*
.*at java.*
.*at com.*
.*at org.*
.*at twigkit.*
Remove webapp restart
.*common frames omitted.*
.*GuiceComponentProviderFactory.*
^SEVERE: The web application.*
.*WebappClassLoader.*
^INFO: Binding twigkit.*
^INFO: Registering twigkit.*
.*NullPointerException.*
Search - Lucidworks Fusion
- http://www\.lucidworks\.com/.*
- .*/Relative-path/
- http://en\.wikipedia\.org/wiki/[Using Regular Expressions^/?]+
- Escape backslashes in a JSON regex request (use \\d instead of \d for example)
- Use Include and Exclude regexes rather than extensions
Escape special characters in Solr Partial Update Indexer ID field
function(doc) {
if (doc.getId() !== null) {
// get the ID
var new_id = doc.getId();
// escape dashes
new_id = id.replace(/-/g,"\-");
// change the id field
doc.setField(id, new_id);
}
return doc;
}
;
Related
- Regex Field Extraction Index Stage
- Lucidworks Fusion Javascript Index Stage
- Solr Reference Guide 6.6 - Escaping Special Characters
ERStudio Web Macro
Create a CSV file of Data Models from HTML file
- Upload ERStudio Web content to server
- Open web page in Chrome browser to index.htm
- Enable Developer Tools -> Elements
- Expand Data Model View to show all Physical Data Models
- Select the <html> element in the frame containing
name="treeframe"
and
Copy OuterHTML - Paste into Notepad++ and format using Tiny2 plugin
- Remove HTML using Regex (create as macro)
// find and replace with empty
.*(<body |body>|<div |div>|head>|html>|script>|style>|<table |table>|title>|<tr |tr>|(A {|BODY {|TD {|content=|initializeDocument|meta name)|<td valign=|</a>|</td>|.*src="Support.*|alt="mauyong">|.*alt=.*/>$|.*href='javascript:clickOnNode.*alt="|" target="basefrm").*
// find and replace with empty
" target="basefrm"
// pagetitle
// find
">$
// replace with
,
// pageurl
// find
.*<a href="
// replace with
http://10.251.61.176/datamodels/
// find
htm$
// replace with
jpg,
// find
.htm id=.*$
// replace with
_img.htm
// remove empty lines
Edit -> Line Operations -> Remove Empty Lines
// collect items on same line
// find
,\R
// replace with
,
// remove extra lines
Title
Diagram
Logical
Main Model
Core Physical Data Model
// add column headings
pagetitle,pagethumbnail,pagelink
Resources
Tools
- Dan's Tools - https://www.regextester.com/
- RegExr - https://regexr.com/
- Regular Expressions 101 - https://regex101.com/
Search
- https://www.visidam.com/2015/09/04/how-to-use-a-regular-expression-in-the-solr-query-2/
- https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax