antipaucity

fighting the lack of good ideas

comparing unique anagrams?

How useful would determining similarity of words by their unique anagrams be? For example: “ROBERT” uniquely anagrams to “BEORT”; “BOBBY” and “BOOBY” both uniquely anagram to “BOY”.

Is there already a comparison algorithm that uses something like this?

What potentially “interesting” discoveries might be made about vocabularical choices if you analyzed text corpora with this method?

splunk: match a field’s value in another field

Had a Splunk use-case present itself today on needing to determine if the value of a field was found in another – specifically, it’s about deciding if a lookup table’s category name for a network endpoint is “the same” as the dest_category assigned by a Forescout CounterACT appliance.

We have “customer validated” (and we all know how reliable that kind of data can be… (the customer is always wrong) names for network endpoints.

These should be “identical” to the dest_category field assigned by CounterACT … but, as we all know, “should” is a funny word.

What I tried (that does not work) was to get like() to work:

| eval similar=if(like(A,'%B%') OR like(B,'%A%'), "yes", "no")

I tried a slew of variations around the theme of trying to get the value of the field to be in the match portion of the like().

What I ended-up doing (that does work) is this:

| eval similar=if((match(A,B) OR match(B,A)), "yes", "no")

That uses the value of the second field listed to be the regular expression clause of the match() function.

Things you should do ahead of time:

  • match case between the fields (I did upper() .. lower() would work as well)
  • remove “unnecessary” characters – in my case, I yoinked all non-word characters with this replace() eval: | eval A=upper(replace(A,"\W",""))
  • know that there are limitations to this comparison method
    • “BOB” will ‘similar’ match to “BO”, but not “B OB” (hence removing non-word characters before the match())
    • “BOB” is not ‘similar’ to “ROB” – even though, in the vernacular, both might be an acceptible shortening of “ROBERT”
  • if you need more complex ‘similar’ matching, checkout the JellyFisher add-on on Splunkbase

Thanks, also, to @trex and @The_Tick on the Splunk Usergroups Slack #search-help channel for working me towards a solution (even though what they suggested was not the direction I ended up going).

vampires *can* coexist with zombies

I made a mistake 4 years ago.

I said vampires and zombies couldn’t [long] coexist. Because they’d be competing for the same – dwindling – food source: the living (vs them both being undead).

But I was wrong.

If the universe in which they exist is a mash-up of that of Twilight and iZombie … it could work.

The iZombie universe has zombies that can avoid going “full Romero” by maintaining a steady supply of brains – and it’s not much they need to eat to stay “normal”.

The Twilight universe has vampires that can survive on animal blood (or, one presumes, by hitting-up blood banks).

So if you were to have “brain banks” the way you have “blood banks” – I could see it working.

Now we just need some iZombie-Twilight hybrid vambie/zompire creatures running around.

how-to timechart [possibly] better than timechart in splunk

I recently had cause to do an extensive trellised timechart for a dashboard at $CUSTOMER in Splunk.

They have a couple hundred locations reporting networked devices.

I needed to report on how many devices they’ve reported every day over the last 90 days (I would have liked to go back further…but retention is only 90 days on this data).

My initial instinct was to do this:

index=ndx sourcetype=srctp site=* ip=* earliest=-90d
| timechart limit=0 span=1d dc(ip) by site

Except…that takes well over an hour to run – so the job gets terminated at ~60 minutes.

What possible other approaches could be made?

🤔

Well.

Here are a few that I thought about:

  1. Use multisearch, and group 9 10d searches together.
    • I’ve done things like this before with good success. But it’s … ugly. Very, very ugly.
    • You can almost always accomplish what you want via stats, too – but it can be tricky.
  2. Pre-populate a lookup table with older data (a la option 1 above, but done “by hand”), and then just append “more recent” data onto the table in the future.
    • This would give the advantage of getting a longer history going forward
    • Ensuring “cleanliness” of the table would require some maintenance scheduled searches/reports … but it’s doable
  3. Something else … that “happens” to work like a timechart – but runs in an acceptable time frame.
  4. Try binning _time
    1. Tried – didn’t work 🤨

So what did I do?

I asked for ideas.

If you’re regularly (or irregularly) using Splunk, you should join the Splunk Usergroups Slack.

Go join it now, if you’re not on it already.

Don’t worry – this blog post will be here when you get back.

You’ve joined? Good good. Look me up – I’m @Warren Myers. And I love to help when I can 🤠.

I asked in #search-help.

And within a couple minutes, had some ideas from somebody to use the “hidden field” date_day and do a | stats dc(ip) by date_day site. Unfortunately, this data source is JSON that comes-in via the HEC.

Poo.

Lo and behold!

I can “fake” date_day by using strftime!

Specifically, here’s the eval command:

| eval date=strftime(_time,"%Y-%m-%d")

This converts from the hidden _time field (in Unix epoch format) to yyyy-mm-dd.

This is the 🔑!

What does this line do? It lets me stats-out by day and site (just like timechart does … but it runs way faster (Why? I Don’t Know. He’s on third. And I Don’t Give a Darn! (Oh! That’s our shortstop!)).

How much faster?

At least twice as fast! It takes ~2200 seconds to complete, but given that the timechart form was being nuked at 3600 seconds, and it was only about 70% done … this is better!

The final form for the search:

index=ndx sourcetype=srctp site=* ip=* earliest=-90d@ latest=-1d@
| table site ip _time
| eval date=strftime(_time,"%Y-%m-%d")
| stats dc(ip) as inventory by date site

I’ve got this in a daily-scheduled Report that I then draw-into Dashboard(s) as needed (no point in running more often, since it’s summary data that only “changes” (at most) once a day).

Hope this helps somebody – please leave a comment if it helps you!

following-up to my ubi mindwalk

I omitted something kinda big when I wrote my one-time UBI proposal last year.

I neglected to address welfare reform.

Welfare would have to be changed for UBI to even have a half a prayer of working.

The “easy” way to do this would be to phase-in reduced welfare benefits on a prorated-equivalent basis for the UBI payment you receive.

Surely there are many other ways to address welfare as part of the one-time universal basic income – suggest them below!

Do I have to participate?

And I missed a second point, too – this should be something you can opt-out of. Just like I wrote about Social Security lo those many moons ago.

No one should be forced to participate – though I strongly suspect most people would rather participate than not.

What about when the program starts?

A third missed point in last year’s thought experiment – a prorated one-time UBI for every citizen over 18 when the program starts. Take the average life expectancy of a USian of, say, 75 years. Subtract 18 to get 57 – there is your basis “100%” one-time payment.

There also needs to be a phase-out cap on one-time benefits at age 74 (ie, when you turn 75, you are no longer eligible to receive a payout).

Now take your age, subtract 18, and divide by 57, and subtract from 100% to get your prorated payment. Are you 27? (27-18)/57 = ~15.8%. 100%-15.8% = 84.2%.

84.2% of $100,000 is $84,200.

Same process if you’re 50: (50-18)/57 = ~56.1%. 100%-56.1% = 43.9%.

43.9% of $100,000 is $43,900.

What if you’re 80? Congratulations! You’ve outlived the average American!