|
How to do things AI Noob vs. Pro
List biggest files Free Open Source: Swiss File Knifea command line
Depeche View
command line
free external tools,
cpp sources
articles |
sfk ofind singleDirName "/searchtext/"
sfk ofind singleFileName "/searchtext/" [options]
sfk ofind -dir mydir -file .docx .xlsx -text "/from/[totext/]"
search in office files like .docx .xlsx .ods .odt
and in plain text files using wildcards * and ?
as well as SFK Simple Expressions in brackets [].
the search text must be surrounded by a delimiter like / or _
or any other character not part of the search text.
by default, full text lines containing hits are shown.
use option -pure to show only the found text.
search text can be followed by a totext to reformat output.
subdirectories are included by default
the sfk default for most commands is to process the given directories,
as well as all subdirs within them. specify -nosub to disable this.
options
-nosub do not include files in subdirectories.
-verbose always show which file is currently read.
-justoffice search only in office files, not in plain text etc.
-case case-sensitive text comparison. default is insensitive.
for details type: sfk help nocase
-text starts a list of search patterns of the form /src/ or
/src/totext/ where / is the separator char, src the text
to search for, and totext a mask to reformat output.
any separator char can be used which is not part of the
search text, i.e. /foo/ or _foo_ both search "foo".
-text is not required if a single filename is given.
-pat the same as -text, starting a pattern list.
-bylist x.txt read search patterns from a file x.txt, supporting
multiple lines per pattern. (add -full for more.)
-bylinelist x read /from/to/ or just /from/ patterns from a file x
with one pattern per line. (add -full for more.)
-by(line)list does not support sfk variables.
to use variables in patterns create an sfk script
with patterns as parameters. "sfk script" for more.
-recsize set input record size for processing (default=100k).
xreplace, xfind and xhexfind extend this automatically
based on the largest search patterns.
-firsthit show only first found pattern match per file.
-utfin with -utfout only: search text is already given
as UTF-8, do not convert internally for search.
-tracesel tell in detail which files are searched or ignored.
-quiet do not show progress infos.
-names list only names of files containing at least one hit.
-notnames list only names of files not containing any hit.
-justrc print no search results, just set return code on hits.
-full print full help text telling about -bylist pattern files,
special character case sensitivity and nested or repeated
replace behaviour.
output options
-utfout keep raw UTF-8 encoding on output, to use it
with further commands requiring UTF-8 data.
-conlines=n1 show n lines of context around search hits. by default
only text lines containing one or more hits are shown.
all lines together cannot hold more than:
-conchars=n2 max. number of characters of all context lines together.
default is 240 or n1*160. cannot be larger than 32000.
-conresline show full result line but no further context (default)
-sep[arator] show "---" separator between hits within a file.
-septext s use separator text s (supports slash patterns \n etc.)
-nosep do not show "---" separator between hits within a file.
-indent=n set n chars of indentation for result display.
-pure extract only searched data, same as -context=0.
you may also set an environment variable:
set SFK_CONFIG=xfind:pure,xfindbin:pure
use -pure -tofile x to extract binary content as is.
-fill=c replace binary null and other unprintable characters
with character c. default is a dot "."
-hex print output as hex dump instead of plain text.
-showle highlight CR/LF line endings in hex dump output
-nofile do not insert :file header lines in output.
-crlf, -lf for file headers and default totext: force crlf or lf
line endings instead of system default
-filehead s file header to insert on every matching file.
only [file.name] surrounded by text can be used.
default is -filehead ":file [file.name]" unless a
single file is searched. cannot be used with xhexfind.
to get result and name in the same line use [file.name]
in the expression, like: sfk xfind -pure -nofile mydir
"/foo*bar/[file.name]: [all]\n/"
-sep s define separator s between hits in a file
-rawterm on output to terminal do not strip codes below 32.
null bytes are always stripped.
-to dir\$file write output files to given path. for details about
output file masks, type "sfk help opt" or "sfk run".
-tofile x write output data to a single output filename x
(which is not interpreted as a mask but taken as is).
+tofile x as last parameter (command chaining): write text as
displayed on terminal to a file x.
-more[n] pause output every 30 or n lines.
-showhits list matching and missing search patterns.
-showjusthit or -showmiss lists only matching or missing patterns.
return codes for batch files
0 = no matches, 1 = matches found, >1 = major error occurred.
see also "sfk help opt" on how to influence error processing.
quoted multi line parameters are supported in scripts
using full trim. type "sfk script" for details.
wildcards and SFK expressions
SFK Expressions are simple patterns containing literal text,
wildcards * and ? and character classes in square brackets [].
basically, the syntax provides extended wilcards but no
further logic and is not related to regular expressions.
search patterns are surrounded by a separator character which
can be anything not contained in the search text, like / or _
within a pattern /fromtext/totext/ the fromtext may contain:
* - 0 to 4000 characters in the same
text line or paragraph, i.e. all
bytes not being CR, LF or NULL.
4000 is just a default maximum
that can be changed by:
[0.100000 chars] - 0 to 100000 characters in the same
text line or paragraph, i.e. the
same as * but with a larger range.
? - one character.
????? - same as [5.5 chars] or [5 chars]
[bytes] - 0 to 4000 bytes (with CR,LF,NULL)
i.e. it collects stream text
across lines, even in binary data
** - the same as [bytes].
[0.100 bytes] - 0 to 100 bytes
[.100000 bytes] - up to 100000 bytes
[1.* bytes] - 1 to default maximum bytes
[2 chars] - exactly 2 chars
[30 bytes] - exactly 30 bytes
[byte of aeiou] - one vocal (a OR A OR e OR ...),
case insensitive by default.
"aeiou" is a character list.
[byte of \\\x2f] - a backslash \ or forw. slash /
[bytes of \r\n \t] - whitespace incl. line ends
[bytes of (\r\n \t)] - the same, () are optional
[bytes not \r\n\0] - up to 4000 bytes as long as no
CR, LF or NULL byte appears
[chars] - the same as [bytes not \r\n\0],
i.e. collect text in a line
[char not ( \t)] - same as [byte not ( \r\n\0\t)],
everything not blanks and tabs
[char not )( \t] - not brackets, blanks and tabs,
same as not (\(\) \t)
[chars of a-z0-9] - means a-zA-Z0-9 as search is
case insensitive by default
[chars of \x61-\x7A] - search a-z but not A-Z, or use
option -case for case search
[eol] - end of line by characters:
CRLF or LF or CR
[white] = chars of (\t ) - 0 or more whitespaces
[xwhite] = bytes of (\t \r\n) - same but across lines
[1 white] = byte of (\t ) - 1 whitespace
[digit] = byte of (0-9) - 1 digit
[digits] = bytes of (0-9) - 0 or more digits
[hexdigit] = byte of (0-9a-f) - 1 hexadecimal digit
[hexdigits] = bytes of (0-9a-f) - 0 or more hex digits
special keywords that do not count as tokens:
[skip] - at the start of a pattern: skip such text
completely, do not count it as a search hit.
[keep] - search also the following text but keep it
in the input data, without consuming it.
[ortext] - foo[ortext]bar searches word foo or bar.
[ortext] is allowed only between literals.
anchors that have no length of their own:
[start] - start of file
[end] - end of file
[lstart] - line start, i.e. start or CRLF or CR or LF
[lend] - logical line end, i.e. eol or end of file.
to replace line ends use [eol] instead.
how to search or replace special characters:
- to search or replace text containing the literal characters
* ? \ [ ] then these must be escaped like \* \? \\ \[ \]
- ( ) are escaped only within character lists, like \( \)
- to search or replace the forward slash '/' type \x2f or use
another char around from/to text, e.g. _fromtext_totext_
- parameters with blanks and non trivial characters need double
quotes "", see also "about Shell Command Characters" below.
expansion priorities: (highest first)
if two search parts are side by side, and the same input
character matches both, then these priorities apply:
5: start, end, lstart, lend
4: literal text, eol
3: whitelist classes: byte of, bytes of
2: blacklist classes: chars not, bytes not
1: plain wildcards: ?, *, **, byte, bytes, chars
this means in "/[bytes]foo/" the [bytes] will stop to collect
characters as soon as "foo" is found, as "foo" is a literal.
on same or higher priority the right side stops the left side.
avoid overlapping character groups. for example, [chars][white]
cannot work, as space and tab are part of chars. to fix this
extend chars by relevant exclusions: [chars not ( \t)][white]
the totext may contain:
[part 1] use first text part of the fromtext.
e.g. the fromtext /*foo[.100 chars]bar*/
contains parts : 1 2 3 4 5
[part1] the same (blank is optional).
[parts 1,2,3] use parts 1, 2 and 3.
[parts 1-10] use parts 1 to 10.
[strip(part1,\0)] use part 1 but remove zero bytes.
only zero bytes "\0" can be removed.
[file.name] full input filename with path
[file.relname] input filename without path
[file.path] input file's path
[file.base] relname without last .extension
[file.ext] input filename extension
[all] use all parts from fromtext.
[setvar name]...[endvar] set variable "name" with data
between setvar and endvar.
[getvar name] fill in data from variable "name"
although anchors like lstart, lend count as a separate part
they need NOT be specified in the totext. this means that
/[lstart]foo[lend]/bar/ just changes the word "foo".
if replace looses line endings in output
- when using [eol] in most cases you should add [part...]
to the output pattern, to copy the actual found line
separators, or line endings may get lost.
supported slash patterns
\t = TAB
\r = CR
\n = LF
\x00 = one byte with code 00 hexadecimal
\0 = short form for \x00
\q = a double quote "
\\ = the backslash character \ itself
\[ = the bracket open character [
\] = the bracket close character ]
\* = the literal star character *
\? = the literal question mark ?
\- = to use literal "-" in a command
Within multi line -bylist files:
\ = slash+blank is changed to a single blank
Only within "char of" or "byte not" lists:
\( = to use literal character "("
\) = to use literal character ")"
SFK expression options
-showpart(s) print /from/ part numbers, range statistics
and expansion priority points per part.
done automatically if a required /to/ text
is not given with a command.
-showbest if a /from/ pattern finds nothing, use this to
see how many parts would match so far, and with
up to how many bytes per part. anchors like [lstart]
may show a non zero length when matching (CR)LF.
-showlist with -bylist, show the internal joined list if
commands are spread across multiple lines.
-showall show all of the above.
-xmaxlen=n set default maximum length for chars or bytes commands,
e.g. -xmaxlen=10000 means /foo*bar/ matches with up to
10000 characters between foo and bar. the default max
length without this option is 4000 characters.
performance notes
- always use a string literal, or single byte or char, at the start
of your search expressions, like in /foo*bar/ starting with 'f'.
Do not use a wildcard like * at the start like in /*foobar/
when searching huge input data, as your search will slow down by
factor 256. Use /[lstart]*foobar/ instead.
- the system may cache output file(s), writing to disk in background
after sfk has finished. subsequent batch commands may execute slower.
chaining support
sfk extract output can be sent only to +xed or +xex.
other commands require an xed conversion step like
sfk extract ... +xed +view
aliases
sfk xhexfind is the same as xfind -hex
to extract unmodified binary data you may use either
sfk xfind -pure ... -tofile or sfk extract ... -tofile
office file support
sfk ofind search in .xml text file contents of
office files like .docx .xlsx .ods .odt.
sfk help office for more infos and options
see also
sfk xfind for more search pattern examples
examples
sfk ofind mydir "/myword/"
search office and plain text files in mydir
containing the word 'myword'.
sfk ofind mydir "/myword/" -names +copy out
same as above, but copy the found files
to a folder 'out'.
sfk ofind mydir "/foo*bar/"
search foo followed by bar in the same line.
sfk ofind -pure mydir "/foo**bar/[part2]\n/"
search text starting with foo, then several
text lines, then ending with bar. print
only the found text between foo and bar.
sfk ofind singleDirName "/searchtext/"
sfk ofind singleFileName "/searchtext/"
[options]
sfk ofind -dir mydir -file .docx .xlsx
-text "/from/[totext/
]"
search in office files like .docx .xlsx
.ods .odt
and in plain text files using wildcards
* and ? as well as SFK Simple Expressions
in brackets [].
the search text must be surrounded by a
delimiter like / or _
or any other character not part of the
search text.
by default, full text lines containing
hits are shown.
use option -pure to show only the
found text.
search text can be followed by a totext
to reformat output.
subdirectories are included by default
the sfk default for most commands is to
process the given directories, as well
as all subdirs within them. specify
-nosub to disable this.
options
-nosub do not include files in
subdirectories.
-verbose always show which file is
currently read.
-justoffice search only in office
files, not in plain text
etc.
-case case-sensitive text
comparison. default is
insensitive. for details
type: sfk help nocase
-text starts a list of search
patterns of the form /src/
or /src/totext/ where / is
the separator char, src
the text to search for,
and totext a mask to
reformat output. any
separator char can be used
which is not part of the
search text, i.e. /foo/ or
_foo_ both search "foo".
-text is not required if a
single filename is given.
-pat the same as -text,
starting a pattern list.
-bylist x.txt read search patterns from
a file x.txt, supporting
multiple lines per pattern.
(add -full for more.)
-bylinelist x read /from/to/ or just
/from/ patterns from a file x
with one pattern per line.
(add -full for more.)
-by(line)list does not
support sfk variables. to
use variables in patterns
create an sfk script with
patterns as parameters.
"sfk script" for more.
-recsize set input record size for
processing (default=100k).
xreplace, xfind and
xhexfind extend this
automatically based on the
largest search patterns.
-firsthit show only first found
pattern match per file.
-utfin with -utfout only: search
text is already
given
as UTF-8, do not convert
internally for search.
-tracesel tell in detail which files
are searched or ignored.
-quiet do not show progress infos.
-names list only names of files
containing at least one
hit.
-notnames list only names of files
not containing any hit.
-justrc print no search results,
just set return code on
hits.
-full print full help text
telling about -bylist
pattern files, special
character case sensitivity
and nested or repeated
replace behaviour.
output options
-utfout keep raw UTF-8 encoding on
output, to use it with
further commands requiring
UTF-8 data.
-conlines=n1 show n lines of context
around search hits. by
default only text lines
containing one or more
hits are shown. all lines
together cannot hold more
than:
-conchars=n2 max. number of characters
of all context lines
together. default is 240
or n1*160. cannot be
larger than 32000.
-conresline show full result line but
no further context
(default)
-sep[arator] show "---" separator
between hits within a file.
-septext s use separator text s
(supports slash patterns \
n etc.)
-nosep do not show "---"
separator between hits
within a file.
-indent=n set n chars of indentation
for result display.
-pure extract only searched data,
same as -context=0. you
may also set an
environment variable: set
SFK_CONFIG=xfind:pure,
xfindbin:pure use -pure
-tofile x to extract
binary content as is.
-fill=c replace binary null and
other unprintable
characters with character
c. default is a dot "."
-hex print output as hex dump
instead of plain text.
-showle highlight CR/LF line
endings in hex dump output
-nofile do not insert :file header
lines in output.
-crlf, -lf for file headers and
default totext: force crlf
or lf line endings instead
of system default
-filehead s file header to insert on
every matching file. only
[file.name] surrounded by
text can be used. default
is -filehead ":file [file.
name]" unless a single
file is searched. cannot
be used with xhexfind. to
get result and name in the
same line use [file.name]
in the expression, like:
sfk xfind -pure -nofile
mydir "/foo*bar/[file.
name]: [all]\n/"
-sep s define separator s between
hits in a file
-rawterm on output to terminal do
not strip codes below 32.
null bytes are always
stripped.
-to dir\$file write output files to
given path. for details about
output file masks, type
"sfk help opt" or "sfk
run".
-tofile x write output data to a
single output filename x
(which is not interpreted
as a mask but taken as is).
+tofile x as last parameter (command
chaining): write text as
displayed on terminal to a
file x.
-more[n] pause output every 30 or n
lines.
-showhits list matching and missing
search patterns.
-showjusthit or -showmiss lists only
matching or missing
patterns.
return codes for batch files
0 = no matches, 1 = matches found, >1
= major error occurred. see also "sfk
help opt" on how to influence error
processing.
quoted multi line parameters are supported
in scripts
using full trim. type "sfk script" for
details.
wildcards and SFK expressions
SFK Expressions are simple patterns
containing literal text, wildcards * and
? and character classes in square
brackets []. basically, the syntax
provides extended wilcards but no
further logic and is not related to
regular expressions.
search patterns are surrounded by a
separator character which can be
anything not contained in the search
text, like / or _
within a pattern /fromtext/totext/ the
fromtext may contain:
*
0 to 4000 characters in the same text
line or paragraph, i.e. all bytes not
being CR, LF or NULL. 4000 is just a
default maximum that can be changed
by:
[0.100000 chars]
0 to 100000 characters in the same
text line or paragraph, i.e. the same
as * but with a larger range.
?
one character.
?????
same as [5.5 chars] or [5 chars]
[bytes]
0 to 4000 bytes (with CR,LF,NULL) i.e.
it collects stream text across lines,
even in binary data
**
the same as [bytes].
[0.100 bytes]
0 to 100 bytes
[.100000 bytes]
up to 100000 bytes
[1.* bytes]
1 to default maximum bytes
[2 chars]
exactly 2 chars
[30 bytes]
exactly 30 bytes
[byte of aeiou]
one vocal (a OR A OR e OR ...), case
insensitive by default. "aeiou" is a
character list.
[byte of \\\x2f]
a backslash \ or forw. slash /
[bytes of \r\n \t]
whitespace incl. line ends
[bytes of (\r\n \t)]
the same, () are optional
[bytes not \r\n\0]
up to 4000 bytes as long as no CR, LF
or NULL byte appears
[chars]
the same as [bytes not \r\n\0], i.e.
collect text in a line
[char not ( \t)]
same as [byte not ( \r\n\0\t)],
everything not blanks and tabs
[char not )( \t]
not brackets, blanks and tabs, same as
not (\(\) \t)
[chars of a-z0-9]
means a-zA-Z0-9 as search is case
insensitive by default
[chars of \x61-\x7A]
search a-z but not A-Z, or use option
-case for case search
[eol]
end of line by characters: CRLF or LF
or CR
[white]
chars of (\t ) - 0 or more
whitespaces
[xwhite]
bytes of (\t \r\n) - same but across
lines
[1 white]
byte of (\t ) - 1 whitespace
[digit]
byte of (0-9) - 1 digit
[digits]
bytes of (0-9) - 0 or more digits
[hexdigit]
byte of (0-9a-f) - 1 hexadecimal
digit
[hexdigits]
bytes of (0-9a-f) - 0 or more hex
digits
special keywords that do not count as
tokens:
[skip]
at the start of a pattern: skip such
text completely, do not count it as a
search hit.
[keep]
search also the following text but
keep it in the input data, without
consuming it.
[ortext]
foo[ortext]bar searches word foo or
bar. [ortext] is allowed only between
literals.
anchors that have no length of their own:
[start]
start of file
[end]
end of file
[lstart]
line start, i.e. start or CRLF or CR
or LF
[lend]
logical line end, i.e. eol or end of
file. to replace line ends use [eol]
instead.
how to search or replace special
characters:
- to search or replace text containing
the literal characters * ? \ [ ]
then these must be escaped like \* \?
\\ \[ \]
- ( ) are escaped only within
character lists, like \( \)
- to search or replace the forward
slash '/' type \x2f or use another
char around from/to text, e.g.
_fromtext_totext_
- parameters with blanks and non
trivial characters need double quotes
"", see also "about Shell Command
Characters" below.
expansion priorities: (highest first)
if two search parts are side by side, and
the same input character matches both,
then these priorities
apply:
5: start, end, lstart, lend
4: literal text, eol
3: whitelist classes: byte of, bytes of
2: blacklist classes: chars not,
bytes not
1: plain wildcards: ?, *, **, byte,
bytes, chars
this means in "/[bytes]foo/" the [bytes]
will stop to collect characters as soon
as "foo" is found, as "foo" is a literal.
on same or higher priority the right side
stops the left side.
avoid overlapping character groups. for
example, [chars][white]
cannot work, as space and tab are part of
chars. to fix this
extend chars by relevant exclusions:
[chars not ( \t)][white]
the totext may contain:
[part 1]
use first text part of the fromtext.
e.g. the fromtext
/*foo[.100
chars]bar*/ contains
parts : 1 2
3 4 5
[part1]
the same (blank is optional).
[parts 1,2,3]
use parts 1, 2 and 3.
[parts 1-10]
use parts 1 to 10.
[strip(part1,\0)]
use part 1 but remove zero bytes.
only zero bytes "\0"
can be removed.
[file.name]
full input filename with path
[file.relname]
input filename without path
[file.path]
input file's path
[file.base]
relname without last .extension
[file.ext]
input filename extension
[all]
use all parts from fromtext.
[setvar name]...[endvar]
set variable "name" with data
between setvar
and endvar.
[getvar name]
fill in data from variable "name"
although anchors like lstart, lend count
as a separate part they need NOT be
specified in the totext. this means that /
[lstart]foo[lend]/bar/ just changes the
word "foo".
if replace looses line endings in output
in output
- when using [eol] in most cases you
should add [part...] to the output
pattern, to copy the actual found line
separators, or line endings may get lost.
supported slash patterns
\t = TAB
\r = CR
\n = LF
\x00 = one byte with code 00
hexadecimal
\0 = short form for \x00
\q = a double quote "
\\ = the backslash character \
itself
\[ = the bracket open character [
\] = the bracket close character ]
\* = the literal star character *
\? = the literal question mark ?
\- = to use literal "-" in a command
Within multi line -bylist files:
\ = slash+blank is changed to a
single blank
Only within "char of" or "byte not"
lists: \( = to use literal
character "(" \) = to use literal
character ")"
SFK expression options
-showpart(s) print /from/ part numbers,
range statistics and
expansion priority points
per part. done
automatically if a
required /to/ text is not
given with a command.
-showbest if a /from/ pattern finds
nothing, use this to see
how many parts would match
so far, and with up to how
many bytes per part.
anchors like [lstart] may
show a non zero length
when matching (CR)LF.
-showlist with -bylist, show the
internal joined list
if
commands are spread across
multiple lines.
-showall show all of the above.
-xmaxlen=n set default maximum length
for chars or bytes
commands, e.g.
-xmaxlen=10000 means /
foo*bar/ matches with up
to 10000 characters
between foo and bar. the
default max length without
this option is 4000
characters.
performance notes
- always use a string literal, or single
byte or char, at the start of your
search expressions, like in /foo*bar/
starting with 'f'. Do not use a
wildcard like * at the start like in /
*foobar/ when searching huge input data,
as your search will slow down by
factor 256. Use /[lstart]*foobar/
instead.
- the system may cache output file(s),
writing to disk in background after sfk
has finished. subsequent batch commands
may execute slower.
chaining support
sfk extract output can be sent only to
+xed or +xex. other commands require an
xed conversion step like sfk extract ...
+xed +view
aliases
sfk xhexfind is the same as xfind -hex
to extract unmodified binary data you
may use either sfk xfind -pure ...
-tofile or sfk extract ... -tofile
office file support
sfk ofind search in .xml text
file contents of
office files like .docx
.xlsx .ods .odt.
sfk help office for more infos and
options
see also
sfk xfind for more search
pattern examples
examples
sfk ofind mydir "/myword/"
search office and plain text files in
mydir containing the word 'myword'.
sfk ofind mydir "/myword/" -names +copy
out
same as above, but copy the found
files to a folder 'out'.
sfk ofind mydir "/foo*bar/"
search foo followed by bar in the
same line.
sfk ofind -pure mydir
"/foo**bar/[part2]\n/"
search text starting with foo, then
several text lines, then ending with
bar. print only the found text
between foo and bar.
you are viewing this page in mobile portrait mode with a limited layout. turn your device right, use a desktop browser or buy the sfk e-book for improved reading. sfk is a free open-source tool, running instantly without installation efforts. no DLL's, no registry changes - just get sfk.exe from the zip package and use it (binaries for windows, linux and mac are included).
|



