Rob Toth

Wednesday, July 30, 2014

Image Analysis & "The Illusion of Consciousness"

I just watched an interesting TED talk on "The Illusion of Consciousness". He basically discusses optical illusions, and how our brain fills in detail without us realizing it.

As a commentary on the talk itself, I think that using optical illusions is interesting in order to study consciousness, but the speaker didn't really make any solid argument that consciousness is an illusion; rather, he simply seemed to say that our eyes interpolate detail for interpretation.

But that's not the focus of this post.

As an image analysis engineer, I have a different commentary. What's really important is the challenge that our brains' interpretation of our eyes' data presents for image analysis engineers.

This is a screenshot from the talk:

Our brain somehow knows that these pixels represent a face, and a specific, well known face at that. But try training a computer to recognize that. It only contains maybe a few hundred grey values in a specific order. We organize that information in our brain to interpret it as a face at a given angle, rotate it mentally, and perform a content based image retrieval from a database in our heads of known faces.

Understanding how we interpret our eyes' data will be instrumental in taking image analysis technology to the next level.

This is why image analysis hasn't replaced human observers quite yet. A pathologist looking at a cell slide can immediately interpret that slide as tumor or non-tumor. We are trying to train computers to follow the steps the pathologist mentally performs. But the pathologist doesn't necessarily know how he interprets the image, which frustrates us engineers who need to train a computer to replicate those precise steps.

When we look at an MRI brain scan, we can interpret it as a single object.

We can even know that the edges represent the same concepts at different parts of the image.

However, different spots on those borders have very different shades of grey. Our brains know they represent the same type of object, but computers are not so good at that yet.

That's just one example of how computers are trying to catch up to our brains.

Not to say that there aren't clever methods such as sophisticated edge detection algorithms, intensity normalization, curvature flow algorithms, etc. (both proprietary algorithms in companies' R&D departments as well as in public academic literature). But a lot of image analysis research has to do with teaching computers to do things which our brains already do so naturally, and quickly.

And more importantly, these tools are simply trying to replicate specific steps our brains take, whereas the actual order of which steps to take, and how analyzing an image with those specific steps, will result in a proper interpretation. In that area, very little research even comes close to learning the proper steps. The closest we've come to that is Deep Learning, but interestingly enough, the most creative thing that our most talented engineers at Google could come up with, is to try to replicate our existing biological neurons. At its essence, Deep Learning is still just another way we are trying to copy nature.

Computer vision is an area of active research and will be for many years to come; it's something to which I have dedicated my career. But perhaps if we understood ourselves and how our minds work a bit better, it would pave the way for exciting new tools which could be used to better thwart terrorism or heal the sick (the most typical use of image analysis technology).

Monday, July 28, 2014

Visualizing 5D

How would one go about visualizing the fifth dimension? Or, more specifically, how would our four dimensional universe look, from a higher dimension?

Let's start with a simple timeline.

In this example, the dark blue represents a one dimensional timeline. To the left represents the past, and to the right represents the future.

What if we wanted to include a spatial dimension to this timeline?

As previously, the left represents the past and to the right represents the future. But we can now take into account a single spatial dimension as representing top and bottom. Maybe that dimension is the distance from the earth.

Extending that idea further, let's add another spatial dimension.

For visualization purposes, if we pretend for a second that the universe only has two spatial dimensions, like a movie, then this is the entire history of the universe represented as a solid object. We've basically replaced a spatial third dimension with time.

Your lifetime would be represented as a section of this object.

A moment in time capturing the universe exactly as it stands right now would simply be a slice of this object.

That slice would represent the exact position of the galaxies, the earth, etc., like a photograph.

What if we lived in the fifth dimension and could change the four dimensional universe?

We could push on one side of it.

We could shape it as we saw fit. One change at one section, would change the entire set of past events and future events. The position of the galaxies, the consequences of all of our decisions, could be completely different.

Taking a slice out of the future of this changed universe would be very different from the future of a non-changed universe.

In fact, the universe itself could be sculpted as one saw fit. Each possible sculpture would represent a possible universe, from the big bang to the big crunch, at the will of a fifth dimension.

Final note for fun: an animated sculpture, with the universe morphing, could represent a multiverse.

Sunday, July 20, 2014

Proving the Existence of a Soul

I had a thought regarding how physicists or mathematicians could theoretically prove the existence of a soul, or "outside observer" to our life experiences.

Definition: Let “memory” be defined as a unique combination of photons hitting our eyes.

Definition: Let “neuronal configuration” be defined as a unique combination of the neurons. That configuration is defined by the biochemical levels in each neuron, the precise synapses and their associated connections to and from each neuron, hormonal levels in the blood, etc. Essentially a unique combination of atoms in our brain.

Definition: Let “S{neuronal configuration}” be the set of all possible neuronal configurations.

Definition: Let “S{memory}” be the set of all possible memories.

Assumption: Two memories of two different events (defined by photons hitting our eyes) are experienced differently.

Assumption: Light hitting our eyes can change our neurons based on our interpretation.

Null Hypothesis: Our memories are defined solely by a “neuronal configuration”.

If |S{memory}| < |S{neuronal configuration}| then each memory can possibly be defined by a unique neuronal configuration.

If |S{memory}| > |S{neuronal configuration}| then a given neuronal configuration cannot be used to completely define a memory. This is only possible if there is an outside observer and P( Null Hypothesis ) = 0. This would prove the existence of an outside observer.
.

Back to Blogging

I was thinking of starting up blogging again. I have a lot of thoughts and I think this would be a good medium to parse through them. If anything, for fear of being morbid, at least it will be a record of my experiences and views on the world for after I'm gone.

So since the last entry, I received my Ph.D. in Biomedical Engineering (Dr. Rob now!) and started a company Toth Technology LLC. We'll see how things go, but I'm excited about the future.

Sunday, May 6, 2012

Initial Keto Thoughts

So I've been experimenting with a ketogenic diet. So far I have tried for 25 days and the results have been interesting. I've been keeping precise data for approximately 8 months in terms of my energy requirements. A keto diet is one in which your consumed carbohydrate are significantly low enough so that your body only uses adipose tissues for energy, which result in ketones as a by product (and which can be detected in a urine test you can buy at a drug store for very cheap). The theory of the popularized "Atkins" diet is that you can lose weight without decreasing how much you eat.

So far I've actually found that to be true, but I only have 25 samples and I need more data. The results of ketosis seem to be an increase in your maintenance metabolic rate, which is simply how many calories you have to consume to not gain or decrease weight.

My theory is that the reason your maintenance requirement increases is that using ketones and not the byproduct of carbohydrates for energy is a very inefficient process. Inefficient is good when it comes to weight loss because that means that it costs your body more energy to give you the same output. Sugars are absorbed very quickly and can be used for energy almost immediately, so they are a very efficient process. That's just my simple theory though.

Without further adieu, here is some of my data thus far:

2011-09-06 to 2011-11-11 (cut)
Days = 66
Change in Weight = -23.8 lbs
Calories / Pound = 3500 (assuming adipose) kCal / lbs
Caloric Deficit = 3500 * 23.8 = 83300 kCal
Average Calories Consumed = 1760 kCal / day
Maintenance Calories = 1760 + 83300 / 66 = 3022 kCal / day

2011-11-11 to 2012-02-01 (maintain)
Days = 83
Change in Weight = -4.6 lbs
Calories / Pound = 3500 (assuming adipose) kCal / lbs
Caloric Deficit = 3500 * 4.6 = 16100 kCal
Average Calories Consumed = 2250 kCal / day
Maintenance Calories = 2250 + 16100 / 83 = 2443 kCal / day

2012-02-01 to 2012-04-11 (bulk)
Days = 70
Change in Weight = +9.6 lbs
Calories / Pound = 3200 (assuming adipose + muscle) kCal / lbs
Caloric Deficit = 3500 * -9.6 = -30720 kCal
Average Calories Consumed = 2770 kCal / day
Maintenance Calories = 2770 - 30720 / 70 = 2331 kCal / day

2012-04-11 to 2012-05-06 (keto)
Days = 25
Change in Weight = -6.4 lbs
Calories / Pound = 3500 (assuming adipose) kCal / lbs
Caloric Deficit = 3500 * 6.4 = 22400 kCal
Average Calories Consumed = 2570 kCal / day
Maintenance Calories = 2570 + 22400 / 25 = 3466 kCal / day

This is a very high variance set of measurements, so the accuracy is in question. Also it is not a static system, in that as your weight fluctuates, so does your maintenance needs. It's very difficult to control for all the variables, and honestly I'm not here to write a paper about it, I'm just seeing how things work for me, so I am not being as scientific about this as I should be. But the trends are interesting, and it appears that my basic caloric needs can significantly increase just from ketosis. More data is needed.

Tuesday, April 24, 2012

Housing Price Distribution

So one of the issues with the crash of 2008 in regards to housing prices was the inability to correctly calculate the risk of the mortgages (or, more specifically, the mortgage-backed securities). This is obviously an extreme simplicity, but from the books I've read, it appears that many risk management mathematics was based off the assumption of a normal distribution of housing price changes. In 2008, however, it appears that housing price changes were veering too far negative.

So I downloaded the raw housing data from each quarter from 1970 to 2011 from http://www.jparsons.net/housingbubble/ and performed some very simplistic analysis using Matlab.

I calculated the change in inflation-adjusted housing prices from each quarter, resulting in 167 values.

Here is a histogram of the change in housing prices from 1970 to 2007 with 15 bins (15 unique x-axis values):

Here is a histogram of the change in housing prices from 1970 to 2011 with 15 bins (15 unique x-axis values):

You can clearly see that from 2007 to 2011, the distribution looks more skewed.

I then performed a Chi-squared goodness of fit test. The output of the test "p" tells you the probability that the distribution is a normal distribution. But be careful here. You can only confidently say that the distribution is not normal when p < 0.05. Otherwise you can't really say much.

Here are the results, depending on the number of bins:

10 bins:
p (1970 to 2007): 0.2097
p (1970 to 2011): 0.0766

15 bins:
p (1970 to 2007): 0.5536
p (1970 to 2011): 0.0021

20 bins:
p (1970 to 2007): 0.2895
p (1970 to 2011): 0.1234

25 bins:
p (1970 to 2007): 0.6482
p (1970 to 2011): 0.2923

So it wasn't unreasonable to assume a normal distribution of housing price changes. Ideally the results should be more granular, and it's a very limited dataset (only 167 values).

The code used to do this was:

[~, p] = chi2gof( quarterly_change(1:150), 'nbins', nbins )

[~, p] = chi2gof( quarterly_change(1:167), 'nbins', nbins )

The lowest value was Q1 of 2008 in which housing prices dropped 7.8% in one quarter. If a normal distribution was used from 1970-2007, then dropping 7.8% in one quarter was 5.4 standard deviations from the mean. Put another way, if we assumed a normal distribution, the chance of getting a drop in prices of 7.8% or greater was .00000002.

normcdf( -.0784, mean(quarterly_change(1:150)), std( quarterly_change(1:150) ) )
ans = 1.9682e-008

Put another way, the chance of it dropping between 0% and 7.8% and was 39.8%.

normcdf( 0, mean(quarterly_change(1:150)), std( quarterly_change(1:150) ) ) ...
- normcdf( -.0784, mean(quarterly_change(1:150)), std( quarterly_change(1:150) ) )
ans = 0.39804

Wednesday, February 8, 2012

TEncryption Hash Algorithm Part 2

In 2009, I made a post about turning a 4-bit hash algorithm into an 8-bit hash algorithm.

My idea was:
"My Password" --> 1100 --> {"11", "00"} --> {0100, 1011} --> 01001011

However, after reading it over, I realized that there was a gaping flaw. The probability of a collision wouldn't change!

Let's say "Password a" and "Password b" both hash to a 4-bit value of 0101. They would both hash to the same 8-bit value in my implementation. This is a major issue, because in theory, an 8-bit hash algorithm should have significantly less collisions (2^4 / 2^8 = 1/16 less chance of a collision).

So how to avoid this issue? Simple. Don't use the 4-bit hash as the basis for the 8-bit hash. Rather, use permutations of the password as the basis for the 8-bit hash. This can include reversing the password, truncating the password, or rearranging the password. This will, in theory, decrease the number of collisions.

How about an example?

Previous implementation with collision:

"Password a" --> 1110 --> {"11", "10"} --> {1100, 1011} --> 11001011
"Password b" --> 1110 --> {"11", "10"} --> {1100, 1011} --> 11001011

New implementation with the same possible collision:

"Password a" --> {"Password a", "a drowssaP"} --> {1110,0110} --> 11100110
"Password b" --> {"Password b", "b drowssaP"} --> {1110,1000} --> 11101000

The point is just because one permutation of a password results in a collision with another password, with a good hash function, there is no indication that another permutation will also result in a collision.

Still, though, concatenation means that precomputing hashes of unsalted, known, commonly used passwords will help you figure out half the resulting hash is problematic. In the above example, the first four bits are 1110 in both passwords because of the collision. So to overcome that issue, you can salt the passwords with the username, or file name, or whatever, which should really be done anyway.

New implementation with salting

"Password a" --> {"Username, Password a", "Username, a drowssaP"} --> {1111,0110} --11100110
"Password b" --> {"Username, Password b", "Username, b drowssaP"} --> {0110,1000} -->01101000

New implementation with salting

(assuming "Username, Password ..." results in collision)

"Password a" --> {"Username, Password a", "Username, a drowssaP"} --> {0110,0110} --> 11100110

"Password b" --> {"Username, Password b", "Username, b drowssaP"} --> {0110,1000} --> 11101000

The only advantage of this is that an attacker could have precomputed that "Password a" hashes to 1110 if "Password a" is a commonly used password, but precomputing wouldn't help when salting.
Another "attack" on this idea is the fact that there might not actually be 2^8 unique hashes. This is because some 8-bit hashes might be impossible to construct from existing 4-bit hashes, and hence an attacker can significantly limit the search space if implausible hashes were to be precomputed. This, however, would be very difficult to do and would take a bit more brain power to figure out than I have available right now, although I recognize the theoretic possibility.