Thursday, March 29, 2012

Android NDK loading resources and PNGs

After following the nice code sample from http://androgeek.info/?p=275 about how to load a PNG file by accessing the Android APK file directly using only the NDK, everything seemed awesome.

It uses libzip and libpng to manually load the APK, then load a png file right from the APK itself without using any Java or the Android SDK.

Why would you want to do this?  Well, aside from liking complicated stuff, it is useful for us because now we can put all of our "portable" resource loading code in c/c++ files so it is more easily moved to iOS, windows, whatever, without being bound to using only the Android SDK, thus having to recode loading logic over and over again for each target deployment architecture.

Everything worked well, until something didn't go quite right.  Some of my textures were corrupted and worse, some actually crashed my app!

It took 4 days to track down what exactly was happening.

Common problems usually indicate I was doing something wrong with opengl and the textures themselves...

* All my textures were perfect powers of two (64x64, 128x128, etc)
* All my textures were 32-bit RGBA files
* I fixed a warning and error from the given sample code to correctly read the libzip'd apk file

So, it was time to dig deeper with what was going on.

First, I noticed that my glGenTextures calls were giving me very large opengl texture ids so I thought this was a huge problem as it was not continuous numbers, and in C, any time you have generating wild numbers, it usually means a bad pointer somewhere.  I spent about a day analyzing and referencing code and started to realize that the generated numbers were always the same (this usually rules out a memory corruption issue if the results are *always* the same).

I noticed that the generated opengl texture ids always followed this pattern on my Droid RAZR:
100271
315638026
534244737
1505553200
-1563003837
...
etc.

Yet, on my Android emulator, the texture id's were:
1
2
3
4
5
...
etc

I had found this thread on google groups that I thought was my problem since it sounded really similar, but it didn't actually help.

Knowing that glGenTextures expects an *unsigned* integer as the data type, I realized that the negative number was an overflowed signed integer and actually nothing to worry about.  In my debugging print code I needed to change "%d" to "%u" to see the actual unsigned value.

Since these numbers repeated on every subsequent run of the program, I decided to test the idea that maybe my code had nothing at all to do with the "corrupt" large texture ids.  Indeed, I made a program that literally only called glGenTextures about 100 times right at startup time and did nothing else.  The result?  The exact same large numbers showed up in my device, and was "normal" on the emulator, as expected.  This left me with the conclusion that whatever logic is in my Motorola device in terms of picking a unique texture ID that this was normal - next.

Yet, this didn't answer my question as to why I was seeing an app crash occur and seemingly only when loading a certain texture or two.  My first order of business was to rule out if my other code was at fault, or was it somehow related to ONLY those sets of textures (somehow). Strangely, dozens of tests indicated that it was only the problem of 3 specific textures being loaded as I added lots of additional textures, even loaded some textures multiple times, then finally ONLY loading one of those 3 problematic textures.  The app would only crash at loading any of the 3.

Frustrated, I ended up putting a ton of extra debugging code all over my apps trying to track down what in the world was wrong.  All indicators were pointing to the idea that a problem was with either libzip or libpng since libc.so itself was crashing in a call to a libpng function -- great, time to track down potential bugs in those libraries.

Now, let me put a disclaimer up here, it is almost never a good idea to assume a mature code library has a bug in it; especially one this severe, yet here I am with a consistent crash always calling a specific line of code that libpng was using leaving me to believe this could be the case.

First, I downloaded the most recent version of each library and replaced the old versions that were provided in the linked post above.  I was now using libzip 0.10.1 and libpng 1.5.9.  This had promise since each of those had a bit of fixes/enhancements along the way.  I was hopeful that this would magically fix my problems.

Well, it did - sort of.

Whatever wizardry was happening internally in those libraries helped indicate that libpng was crashing specifically on this line:

png_read_info(png_ptr, info_ptr);

Great, all this work only to help reinforce the idea that libpng was broken for my special case.  Yet, I still couldn't rule out that libzip somehow corrupted my image data while decompressing my APK so libpng might not be at fault trying to load corrupted data.  Worse, all error checking came up empty for all libzip and libpng calls!  I even checked my opengl calls for errors using glError -- NOTHING!

Well, the next step was to dive in to those library's source code and start tracking stuff down, the last resort I guess was to find the problem in one of these libraries and fix it myself and give the respective team a patch!

One of my testing tangents included making a stand alone C project to test very specific sets of code to completely rule out any Android Java JNI mysticism causing problems.  Everything was pointing to only those textures being a problem in the libpng loading code.  So I ran the libpng test program on those 3 test pngs and the only thing that sounded remotely threatening was:

Files aaaa.png and pngout.png are different
Was aaaa.png written with the same maximum IDAT chunk size (8192 bytes), filtering heuristic (libpng default), compression level (zlib default), and zlib version (1.2.5)?


It certainly sounded menacing and I was convinced that somehow my textures were corrupted.  I ended up regenerating those textures using the latest version of Gimp (2.6.11 at the time) -- plugged those in to my app in excitement; and they still crashed.

Strangely, the libpng test program generated pngout.png which was seemingly the same exact copy of my original texture, but a different filesize; something was different, and different was good at this point.  For giggles, I put that pngout.png file in to my Android's APK res/raw folder in place of my original texture and much to my surprise the texture loaded!!!  Ah-hah, the answer finally!

Unfortunately, things weren't over yet, the texture loaded completely corrupted.  Everything was misaligned, the colors were wrong and the texture was completely garbled with random colors all over it.  Basically, I've gone nowhere.

3 days in and I'm still struggling with what is wrong here.  I'm still baffled, why is it that my 10 other textures work fine, yet these 3 don't want to work at all?!  I decided to write my own texture loading function from scratch to ensure I understood exactly every step of logic that was taking place.  The same result happened.  I did all sorts of strange additional tests, hex dumps, you name it.  I was at the end of my rope on this one, I was getting angry.

Finally, I saw an interesting article somewhere that said I needed to rename my png files to mp3 files.  I thought to myself, that's the dumbest thing I ever heard - why would I ever want to do that?!  Well, if you're programming for Android, you might be crazy enough to hear this out.

Apparently, during the APK packing process *SOME* of your png textures are automagically compressed and others are not.  Renaming your png files tells the apk packing process to not mangle/compress that file, and my happy 3 textures were some of the lucky selected ones for this process; which completely broke the texture loading code.  In a move of desperation, I did rename my files to mp3 to verify this was the case, and they worked perfectly.  I was enraged and ecstatic at the same time.

On the bright side I got a better understanding of both libpng and libzip (we got a little familiar with each other, if you know what I mean).

Alright, so the magic bullet answer is that I need to rename all my PNG files to MP3 (test.png.mp3 if you will) or simply dump them all in to a game data file of sorts (probably the better idea).

So, I make a quick resource packing program (using zlib) and I find out that during the APK packing process again, my GZ files are also modified!!!

I quickly found this link explaining it a little bit.

I wonder what other goodies are in store for me to discover by accident!?

Hope this article is found by anyone else stuck in my situation and save you all a couple days of madness.

3 comments:

  1. What textures ids were on device after fix?

    ReplyDelete
  2. The same texture IDs were on the device afterward as well, which I think is sort of strange, but apparently has something to do with the opengl drivers on the device itself? I wish I had a more solid answer on this one, but I can't really think of anything else.

    Strangely, my Samsung Galaxy tablet has the same odd high texture ids coming back just like the Motorola Razr, yet in the emulator it acts as expected (1, 2, 3, 4, 5...)

    I'm curious as to why actual devices have such crazy id numbers coming back - or maybe I just happen to have two devices with the same "feature"?

    ReplyDelete
  3. The Ids may be because of Java's big endiannes compared to GL which is reversed ?

    ReplyDelete