Hi Community,
This post is to introduce one of my first project in COS, I created when started to learn the language and until today I'm keeping improve it.
The CosFaker(here on Github) is a pure COS library for generating fake data.
cosFaker vs Populate Utils
So why use cosFaker if caché has the populate data utility?
Ok the populate utility has great things, like the SSN Generator for example, but what to do when you have a field with a long description of a product? How to check if that table will list the emails or if that calculated property will count the days of the last user interaction.
For me cosFaker is a populate utils with steroids! You can use together with the Populate, to generate %Stream or long strings, or random Dates.
e.g.
Class Sample.Product Extends (%Persistent, %Populate, %XML.Adaptor) { Property Type As %String; Property Notes As %String(MAXLEN = 250, MINLEN = 10); Property Name As %String; Property Origin As %String; Property LastInteraction As %TimeStamp; Method OnPopulate() As %Status [ ServerOnly = 1 ] { Set tSC = $$$OK Try { Set ..Type = "Coffee" Set ..Name = ##class(cosFaker.Coffee).BlendName() Set ..LastInteraction = ##class(cosFaker.Dates).Backward($Random(80)) Set ..Notes = ##class(cosFaker.Coffee).Notes() Set ..Origin = ##class(cosFaker.Coffee).Origin() } Catch tException { Set:$$$ISOK(tSC) tSC = tException.AsStatus() } Quit tSC }}
Do ##class(Sample.Product).Populate(10)
And it's great to write unit tests like that:
Method TestPersonLogin() As SampleProject.DataModel.Person { Set person = ##class(Sample.DataModel.Person).%New() Set person.FirstName = ##class(cosFaker.Name).FirstName() Set person.LastName = ##class(cosFaker.Name).LastName() Set person.Email = ##class(cosFaker.Internet).Email(person.FirstName, person.LastName) Do $$$AssertStatusOK(person.%Save()) Set matcher=##class(%Regex.Matcher).%New("\A([\w+\-].?)+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z") Set matcher.Text = person.Email Do $$$AssertTrue(matcher.Locate()) Do $$$AssertEquals(person, ##class(Sample.DataModel.Person).%OpenId(person.%Id())) }
cosFaker is FUN!!
Yes, cosFaker is fun... Insted of names like "Gibbs, Zoe K.", you have "Goku" or "Piccolo" using
And a lot of funny stuffs like Pokemon Name Generator, Star Wars Planets or Droids, Coffee, UFC Fighters Names, Lorem Ipsum, etc...
So, That's all folks
Cheers
Do you have any benchmark data comparing this to the populate utils?
Cheers,
Fab
Hi @Fabian.Haupt
Unfortunately I didn't do a benchmark... But it's an awesome idea, compare the performance, I'll do and put here the results.
Thanks ;)
what benchmark? populating the data or retrieving populated data? I think the speed of data population is not that important, comparing the standard populate or cos faker. What would be important is ability of the tool to mimic real data at maximum possible extent (e.g. values distribution).
When just seen this project I thought that it is based on faker.js project (demo). But, unfortunately, they made their own base. Faker.js, by the way, is a quite good project for populating data in javascript (frontend or backend, no matter), it supports many languages, even Russian and Czech, and lots of different formats of data.
Part of testing with populated data is performance testing. If your data populating utilities can't give a high enough throughput, you can't really test your application under load.,
And generating meaningfully big sets of data requires a lot of time. So for example, with Caché populate utils it takes 7.891 seconds on my machine to create 1M pairs of
The same takes 0.39s on my machine with a rudimentary go implementation.
I very much disagree that performance doesn't matter.
In terms of an online service, you could do something like:
curl -H "Content-Type: application/json" -X POST --data '{"count":1000000,"headers":false,"fields":[{"name":"Name","type":"name"},{"name":"Age","type":"digits"}]}' http://data.panadadata.com -o data.json
(disclaimer, I run that service)
Also available on Open Exchange!
I really like this project. Utilities like this are very precious for testing and benchmarking.
Unfortunately the container build breaks so I could not test it but I think this should be pursued. Good work!
my PullRequest is pending.
IMAGE=intersystemsdc/iris-community:2019.4.0.383.0-zpm
is broken as it starts with SHELL ["/irissession.sh"]
instead of SHELL ["/bin/bash", "-c"]
Newer imags don't have that problem