How do I get a list of files in directory including subdirectories?
I'm aware of two ways to get list of files in a dir:
set dir = "C:\temp\" set rs = ##class(%File).FileSetFunc(dir, , , 1) do rs.%Display()
and:
set dir = "C:\temp\" set file=$ZSEARCH(dir_"*") while file'="" { write !,file set file=$ZSEARCH("") }
Yet they bot return only files and directories in a current directory, but not files in subdirectories.
I suppose I call one of these recursively, but maybe there's a better solution?
Unless you're asking for maximum performance you could create a method that receives or creates a %SQL.Statement for the %File:FileSet query and detect if "Type" is "D" to call it recursively, passing that statement instance.
Method SearchExtraneousEntries(
while row.%Next(.sc) {
Here's a use case that I applied that pattern:
statement As %SQL.Statement = "",
path As %String,
ByRef files As %List = "")
{
if statement = "" {
set statement = ##class(%SQL.Statement).%New()
$$$QuitOnError(statement.%PrepareClassQuery("%File", "FileSet"))
}
set dir = ##class(%File).NormalizeDirectory(path)
set row = statement.%Execute(dir)
set sc = $$$OK
if $$$ISERR(sc) quit
set type = row.%Get("Type")
set fullPath = row.%Get("Name")
if ..IsIgnored(fullPath) continue
if type = "D" {
set sc = ..SearchExtraneousEntries(statement, fullPath, .files)
if $$$ISERR(sc) quit
}
if '..PathDependencies.IsDefined(fullPath) {
set length = $case(files, "": 1, : $listlength(files)+1)
set $list(files, length) = $listbuild(fullPath, type)
}
}
quit sc
}
I'm voting for Rubens's solution as it is OS independent. Caché as a great sandbox in many cases, why not use its "middleware" capabilities? Developer's time costs much more than CPU cycles, and every piece of OS dependent code should be written and debugged separately for each OS that should be supported.
As to performance, in this very case I doubt if the recursion costs much compared to system calls. Anyway, it's not a great problem to replace it with the iteration.
What is ..IsIgnored and ..PathDependencies? You could post a complete example?
Nice solution, not seen RunCommandViaCPIPE used before.
Looking at the documentation it says...
"Run a command using a CPIPE device. The first unused CPIPE device is allocated and returned in pDevice. Upon exit the device is open; it is up to the caller to close that device when done with it."
Does this example need to handle this?
Also, do you not worry that its an internal class?
If you want something more regular you could pipe the output via OS:
This also outputs more than 32000 kb.
set errorLogDir = ##class(%File).TempFilename()
set outputLogDir = ##class(%File).TempFilename()
set command = "dir /A-D /B /S ""%1"" 2> ""%2"" > ""%3"""
quit $zf(-1, $$$FormatText(command, "C:\InterSystems\Cache", errorLogDir, outputLogDir))
Now simply open and read the logs.
I don't recommend opening %ResultSet instances recursively.
It's more performatic if you open a single %SQL.Statement and reuse that.
This will also solve you quite some bytes being allocated for the process as the depth keeps growing.
Initially the question was about alternative ways of solving the issue (in addition to recursive FileSet and $ZSEARCH).
I just proposed a third method, namely using the capabilities of the OS itself. Maybe someone here didn't know about it.
Which option at end to choose - to solve the developer.
We here vote for the best solution or in general for offered solutions?
If the first, then I'll pass.
I don't understand the difference between this two kinds of voting :) Which solution is the best, depends on many factors: if we need turbo performance, we'd take your approach, if not - %ResultSet based one. BTW, I guess that file dirs scanning is a small part of a bigger task and those files are processed after they have been searched and the processing takes much longer than the directory search.
Last 2c for cross-platform approach: the main place where COS developer faces problems is interfacing with 3d party software. As I was told by one German colleague, "Cache is great for seamless integration".
E.g., I've recently found that forcely resetting of LD_LIBRARY_PATH by Cache for Linux may cause problems for some utilities on some Linux versions. It's better to stop here, maybe I'll write about it separately.
There are a few comments:
Let's say that the developers changed something in new versions of the DBMS. Is this a problem?
It is enough to check Caché Recent Upgrade Checklists, where usually there is a ready list of changes that may affect existing user code, for example.
Note that this can apply to absolutely any member of a class, and not even marked as [Internal]. Suffice it to recall the recent story with JSON support.
For a class does not exist flag [Internal], the warning is only on the level of comments.
As for the other members of the class, according to the documentation this flag is for other purposes, namely:
Vitaliy is faster that's probably because Caché is delegating the control to the OS's native API.
That is indeed the best approach, and could be made cross-platform by using $$$isUNIX, $$is$WINDOWS and $$$isVMS.
Now I gotta say, I'm impressed by these results.
Ex: Using find -iname %1 instead of dir.
Then, what's the advantage of using %SQL.Statement over %ResultSet? The only reason I can think about is for using it's metadata now.
EDIT: Ahh, there's a detail. %FileSet is not persisted, neither is using SQL. It's a custom query being populated by $zsearch internally.
Maybe for SQL based queries %SQL.Statement would be better.
I've removed the recycled resultset example, it is not working correctly. Might not work at all as a recycled approach, will look at it further and run more time tests if it works.
In the mean time, my original example without recycling the resultset, on a nest of folders with 10,000+ files takes around 2 seconds, where as the recycled SQL.Statement example takes around 14 seconds.
OK, I got the third example working, needed to stash the dirs as they were getting lost.
Here are the timings...
Recursive ResultSet = 2.678719
Recycled ResultSet = 2.6759
Recursive SQL.Statement = 15.090297
Recycled SQL.Statement = 15.073955
I've tried it with shallow and deep folders with different file counts and the differential is about the same for all three.
The recycled objects surprisingly only shave off a small amount of performance. I think this is because of bottlenecks elsewhere that over shadow the milliseconds saved.
SQL.Statement 6-7x slower that RestulSet is a surprise, but then the underlying implementation is not doing a database query which is where you would expect it to be the other way around.
The interesting thing now would be to benchmark one of the command line examples that have been given to compare.
Just for good measure, I benchmarked Vitaliy's last example and it completes the same test in 0.344022, so for out and out performance a solution built around this approach is going to be the quickest.
While it is true that Internal does not mean deprecated it is still not recommended that you utilize such items in your application code. Internal means that this is for InterSystems internal use only. Anything with this flag can change or be removed with no warning.
What the ... ? wow.
I'll have to update my code if that's true for every case. Could you measure the execution time for both approaches?
Correct, of course, to close the device:
No, since for me internal ≠ deprecated.
However, given the openness of the source code, you can make own similar method, thereby to protect yourself from possible issues in the future.
> I don't recommend opening %ResultSet instances recursively.
Agreed, but maybe splitting hairs if only used once per process
> It's more performatic if you open a single %SQL.Statement and reuse that.
Actually, its MUCH slower, not sure why, just gave it a quick test, see for yourself...
{
if pState="" set pState=##class(%SQL.Statement).%New()
set sc=pState.%PrepareClassQuery("%File", "FileSet")
set fileset=pState.%Execute(##class(%File).NormalizeDirectory(pFolder),pWildcards,,1)
while $$$ISOK(sc),fileset.%Next(.sc) {
if fileset.%Get("Type")="D" {
set sc=..GetFileTree(fileset.%Get("Name"),pWildcards,.oFiles,.pState)
} else {
set oFiles(fileset.%Get("Name"))=""
}
}
quit sc
}
** EDITED **
This example recycles the FileSet (see comments below regarding performance)
{
if fileset="" set fileset=##class(%ResultSet).%New("%Library.File:FileSet")
set sc=fileset.Execute(##class(%File).NormalizeDirectory(pFolder),pWildcards,,1)
while $$$ISOK(sc),fileset.Next(.sc) {
if fileset.Get("Type")="D" {
set dirs(fileset.Get("Name"))=""
} else {
set oFiles(fileset.Get("Name"))=""
}
}
set dir=$order(dirs(""))
while dir'="" {
set sc=..GetFileTree3(dir,pWildcards,.oFiles,.fileset)
set dir=$order(dirs(dir))
}
quit sc
}
Here.
If you need an explanation, the method I used for example is part of an algorithm that keeps the repository in-sync with the
the project.
You're right, your approach is the best, even though it's not cross-platform. Such issue could be solved by using $$$isUNIX
and $$$isVMS though.
Something like:
if $$$isUNIX set command = "find %1"
Yup, but if you read the source code, it's limited to 32000 kb.
And if so?
Oh, yes. You could open the file directly too.
This is what makes the method RunCommandViaZF
Yes, of course.
E.g. (for Windows x64):
{
set fileset=##class(%ResultSet).%New("%Library.File:FileSet")
set sc=fileset.Execute(##class(%File).NormalizeDirectory(pFolder),pWildcards,,1)
while $$$ISOK(sc),fileset.Next(.sc) {
if fileset.Get("Type")="D" {
set sc=..GetFileTree(fileset.Get("Name"),pWildcards,.oFiles)
} else {
set oFiles(fileset.Get("Name"))=""
}
}
quit sc
}
Search All...
Search for specific file type...
Search for multiple files types
How about this, call with
DO ^GETTREE("/home/user/dir/*",.result)
and $ORDER() through result.
#INCLUDE %sySite
GETTREE(wild,result) ;
NEW (wild,result)
SET s=$SELECT($$$ISUNIX:"/",$$$ISWINDOWS:"\",1:1/0) ; separator
SET w=$SELECT($$$ISUNIX:"*",$$$ISWINDOWS:"*.*") ; wild-card
SET todo(wild)=""
FOR {
SET q=$ORDER(todo("")) QUIT:q="" KILL todo(q)
SET f=$ZSEARCH(q) WHILE f'="" {
SET t=$PIECE(f,s,$LENGTH(f,s)) QUIT:t="." QUIT:t=".."
SET result(f)=""
SET todo(f_s_w)=""
SET f=$ZSEARCH("")
}
}
QUIT
Flaws:
On my mac, I have some directories so deep, $ZSEARCH() fails.
It doesn't work on OpenVMS. As much as you may want to think that you can convert dev:[dir]subdir.DIR;1 to dev:[dir.subdir]*.*;*, and keep searching, there are too many weird cases to deal with on OpenVMS, better to just write a $ZF() interface to LIB$FIND_FILE() and LIB$FIND_FILE_END().