The roads have been nasty and it's been pretty cold. I'm tempted to ride into the office anyhow, because the charging stations are live!
While I was on vacation last week, they finished the physical installation of the charging stations. Looks like the contractors let a little snow get in their way, because they're still not wired up and operational. Hopefully we'll have some clear weather in the weeks to come. I can't wait to use them.
I've mentioned before that the facilities manager at my office is a true-blue believer in EVs. He's had a long career working with industrial electric motors, and understands them to their very core. He's really supported me and the Enertia from day one. He even putting up with its charging fans blowing right outside of his office inside of our shipping and receiving area. He's dead set on getting a Nissan Leaf too, because its got the range to suit his commuting needs.
He's been giving me progress reports on the company's initiative to install Coulomb Charging Stations at work. There have been some delays with the contractors, but I'm happy to say that they've broken ground this week. From the looks of it, we should have five posts serving ten spots with Level 1 and Level 2 charging.
They made a little more progress on day two. There are trenches behind the ledges and some electrical utility boxes installed.The boxes are kind of ugly, so I hope they do something to disguise them. The last thing that I want to hear is people condemning them because they're ugly. As it is, the location is already taking up exterior spaces where the car worshiping d-bags double park their cars like it's some sort of Grease era car show.
I can't wait to see them operation. From what I've been told, they'll be open to the public too. So anyone with a ChargePass Card (like me) can use them. I'm not sure if that policy will be permanent, but I can't imagine that there will be too many non-employees using them. When they go online, hopefully they'll show up on Coulomb's Awesome Webapp.
Of course, when they do go online, it means the end of my indoor parking. Oh well.
I know this is a naive thing to
say type, but after finishing this program I kind of feel like I just implemented OpenGL minus shaders. My approach was to get the scene parsing implemented first and then to get the GL Preview feature working. This allowed me to very quickly setup my light and camera and then to see my goal. Then I started in on my raster pipeline.
Here's a quick list of the features for my Raster Pipeline:
- Moveable camera
- Moveable light
- Orthographic Projection
- Perspective Projection
- Per-Vertex Color
- Flat Shading
- Gouraud Shading
- Phong Shading
- Use full Phong lighting with light intensity falloff
- Configurable (on/off) backface culling
- Configurable (on/off) cheap clipping
- Efficient span-based triangle fill
- z-buffer with epsilon for z-fighing resolution
- Timing instrumentation
- PNG output
And here are the features support in the OpenGL Preview mode:
- Moveable camera
- Moveable light
- Orthographic Projection
- Perspective Projection
- Per-Vertex Color
- Flat Shading
- Smooth (Gouraud?) Shading
After doing two raytracing assignments, I really doubted that rasterizing would hold a candle in terms of aesthetics. I was stunned when I saw how good the OpenGL preview looked, so I really wanted to dive into shading. I ended up implementing wireframes, flat shading, Gouraud shading and Phong shading.
I then started in on my own raster pipeline. As I stumbled through a myriad of problems with my transformations. In particular, the projection transformations were troublesome. I tried to implement them in a way similar to the class notes, but I was getting the results that I was looking for...or any results. I kept segfaulting. I turned to the text, and found that they did a great job explaining both orthographic and projection transformations.
Clamping and normals were also a problem for me. Interestingly enough, once you fix one clamping or normal bug, you tend to clamp and normalize everything. The clamping problem was worst with my color calculations. Specular highlights produce some very illuminated pixels. I ended up bleeding past 1.0 on several of the channels which caused several rainbow effects. Additionally, when I was calculating barycentric coordinate, floating pointer errors led to scenarios where the coordinates were being returned beyond [0.0,1.0]. Normally this would mean that the point was off of the triangle, but I was attempting to calculate for pixels that were known to be on the triangle.
Normals were by far the most difficult problem. At least it was the toughest one I had to solve. My specular highlights were causing a grid pattern along the edges of triangles. I fought it for two days. My problem resulted from normals interpolated between to vertices on the edges. The were not unit length, and so they increased the effect of the specular highlights when I calculated the dot product with the half viewing vector. Normalizing these fixed the problem.
Backface culling was a really straight-forward optimization to make. To implement it, I added a check right before the viewing and projection transformations. The check involved computing the dot product of each of the normals with the viewing vector. If none of those normals were visible, then the entire triangle is back facing and was culled. It yielded a significant speedup on Andrew's dragon model.
rasterizer --projection persp -0.1 0.1 -0.0 0.2 3.0 7.0 --camera 0 0 5 0 1 0 --light 0.1 0.1 0.1 --nocull scenes/dragon.txt
Render scene: 1287.702000 ms
rasterizer --projection persp -0.1 0.1 -0.0 0.2 3.0 7.0 --camera 0 0 5 0 1 0 --light 0.1 0.1 0.1 scenes/dragon.txt
Render scene: 708.403000 ms
I really wanted to implement full clipping, but I found out that "cheap clipping" is pretty effective by itself. The first step is to add a check if a pixel is in the viewport before calculating the color for it. Calculating color is pretty expensive, so this eliminated a lot of cost. Then next step was to use Cohen-Sutherland clipping to determine when a line or triangle was completely outside of the viewport. I didn't do a thorough test either. I did the simple bit-wise and operation on the bit codes for each point and rejected the triangle if it was not zero. This means that some of the corner cases were missed.
By cheating like this, I was able to avoid a lot of triangles without having to implement the clipping of individual triangles into separate polygons. This meant that I was still rasterizing parts of triangle that were outside of the viewport, but at least with my check above I wasn't calculating the color for them. The results were rather satisfactory, especially compared to the cost of implementing it.
rasterizer --camera 0 0 5 0 1 0 --projection persp -0.1 0.1 -0.1 0.1 3.0 7.0 //zoom_in --noclip scenes/beethoven.txt
Render scene: 414.369000 ms
was reduced to
rasterizer --camera 0 0 5 0 1 0 --projection persp -0.1 0.1 -0.1 0.1 3.0 7.0 //zoom_in --output img/beethven_clipped.png scenes/beethoven.txt
Render scene: 310.444000 ms
Although a span-based triangle fill was pointed out as an opportunity for extra credit, it was really the most straightforward way to implement this for triangles, since they're convex. At one point in my career, I did a lot of 2D raster graphics work for J2ME cellphones. Most of our displays were optimized to send data to the display in rows. So I attacked this problem the same way. I found the top most pixel. I then started drawing each leg using the midpoint line algorithm. Each time I placed a pixel which changed y, I added it to an edge list. When I reached the end of a leg, I switched to the third segment...unless that leg was already horizontal. I then went back and drew horizontal lines from one edge map to the other. Since this was the only triangle fill algorithm I used, I didn't get any timing numbers for comparison.
The use of a Z-Buffer to determine the rendering order is so genius in its simplicity, that I didn't even consider any other ways to implement it. So this is another scenario where I didn't try to implement another method for comparison. However, I was able to throw in a small improvement that resolve the z-fighting example that I threw at it. When determining when to paint over another pixel, I checked that the new pixel was closer to the camera by a margin, epsilon. I set epsilon to 0.000001. It resolve my test model without causing any visible changes to the other models. My testing certainly wasn't extensive, and so I'm sure that it would fail on scenarios where a camera with a very narrow FOV caused massive magnification. Perhaps in that situation, I could use a dynamic epsilon that is calculated based on the camera's FOV.
Here are the remaining rendering of the models provided, including Andrew's dragon model from the Stanford 3D Scan Repository.
For an Operating System / Window Manager Engineer, focus usually means the application in the foreground. The application with focus is receiving keyboard and mouse events. On some systems, only the application with focus can make sounds. Furthermore, the applications without focus may be running at a lower priority, thus receiving less compute time.
Modal / Full Screen UI
In the mobile space, this question of focus is rather straight forward. Displays are so small, that the window manager will display the application with focus on the entire screen. Although I think it's a bit of a misnomer, more and more people are refer to such as scheme as a modal UI. These modal, or full-screen, UIs have been getting a lot of news lately. Steve Jobs announced that full-screen apps are going to play a more serious role in Mac OS X Lion.
I was a little apprehensive with fear that he was going to dumb down my Mac desktop user experience. I gained more confidence in the idea when I thought about all of the [semi]-pro apps that I use on my Mac that already had full screen modes. I always figured that those apps were full screen to give creative professionals the maximum amount of real estate. Now, I actually think it has more to do with minimizing distraction and allowing for better mental focus.
Full Screen Equals Full Mental Focus
This point hit me late last night. I bought an iPad yesterday. I bought it primarily for leisure computing. I found that my MacBook Pro was constantly in the middle of 2-3 school/geek projects. I tend to just leave things open when I'm in the middle of them. I feel it encourages me to pick back up more easily. What it actually does is stress me out and distract me. I couldn't even enjoy a cup of coffee and read RSS feeds without wanting to touch up some OpenCL. My idea for the iPad was to get away from a desk and relax a little. I could ignore all of those open projects and relax for a few minutes.
That lasted about an hour last night before I found myself downloading class notes and sitting at the kitchen table with a beer for some late night studying. It was really effective too. When you're working in a modal UI, all you can do is what's in focus. And if you turn off status updates, you won't even be bothered by incoming emails, tweets, calendar notifications, etc. I was easily able to stay on task, only briefly popping over to another browser window to look things up.
Apple Might Be On To Something
I'm definitely going to dwell on this some more and make some personal observations about my usage, but I think Steve might be on to something. We've long known that multi-tasking hits a point of diminishing returns after two or three tasks. I personally struggle with the constant context switching. Having a modal UI might help me focus on the task at hand, whether it's studying, coding, or relaxing.
BTW, Google Reader Play is an absolute joy on the iPad. Too bad it doesn't use my feeds.
Download Project Source/Scenes/Mac-Binaries: Program3.tar.gz
For the first part of the Ray Tracing project, I added a quite a few extra features. One of those extra features was the recursive calculations of Specular Reflection, Dielectric Reflection, and Dielectric Transmission. I considered myself pretty lucky, considering that this feature is one of the two features that we were required to add for this second part. However, I wasn't quite in the clear. Lets just say that I had a looser understanding of ray tracing than I thought.
Many of the features of my Ray Tracer were implemented in the first part. That list can be seen on that project post. The following are new features:
- Mid-Axis Partitioning
- SAH Partitioning
- Cost-Based Termination of leaf nodes for both
- Recursive KD Tree Traversal
- KD Tree Printing in Debug Builds
- Fixed Ray Tracing bugs
- Dramatically Improved Ray Tracing Performance
- Interactive Ray Shooting in Debug Builds
When you launch my Ray Tracer, the following are the defaults that are used unless you specify otherwise.
- Dimensions: 500x500
- Sampling: 4 x 4 Adaptive Jittered Supersampling
- Ray Casting: Blinn-Phong Lighting with an Ambient Factor
- Ray Tracing: Specular Reflections, Dielectric Reflections, and Dielectric Transmission supported through recursion terminated based on a contribution threshold
- Multi-Processing: Uses multiple CPU cores through OpenMP
- KD Tree: Built using SAH cost analysis to determine best split and when to terminate branches
The optics involved with refraction are not very intuitive to me. Initially, I thought that an image seen through a glass sphere would be reduced, but instead it's actually magnified. I had a rather serious bug when calculating refraction. I was refracting my rays with a dot product of the ray and surface normal with the wrong sign. Correcting that made a tremendous difference.
Once I started rendering the scene with 16 spheres, I started to realize that I had some serious additive errors on calculations of the transmissive component. There were two reasons for this. Firstly, I was calculating the reflective and transmissive components inside of the loop that calculated the phong shading for each light source. Fixing this corrected several of the bright spots, and it led to a signficant speed improvement.
Secondly, I was calculating the phong shading and reflective components for for illumination points inside of a sphere. This scenario arose whenever I was calculating the color for a refracted ray transmitted through a sphere. The refracted ray would intersect the other side of the sphere on the inside, and at that point I should have only been calculating the transmissive component. Making this change also led to a dramatic speedup.
My KD Tree was actually a deceleration structure for much of the project. I had several issues when creating the tree, as well as the traversal. I started by creating a KD Tree that simply divided the space at the midpoint of the split axis.
The first issue that I had, was that I was creating a new bounding box around the primitives in each of the newly split subspaces. This is very problematic, because it created overlapping spaces whenever I had the occurrence of a single primitive shared between both spaces. Once drew a clear delineation between space partitioning and bounding volume hierarchies, I was able to clean up my KD Tree and I saw fewer artifacts.
My next hurdle was understanding how to traverse the tree correctly and to fix the remaining artifacts. Initially, while I was trying to learn how KD Trees work, I was only considering an algorithm where the ray always intersected the outermost bounding box. This was fundamentally flawed for several reasons. First off, the ground sphere made the bounding box really large, but it didn't create to many artifacts for me. The second major instance of rays originating inside of the outermost bounding box where theway used when calculating shadows, reflections, and transmissions. Through some digging, I found a very helpful post that illustrated the different cases that have to be handled when traversing a KD Tree.
His diagrams were very helpful. They were so implanted in my brain that I ended up adopting his algorithm completely from the code that he posted. He still missed two cases, that I initially had some trouble finding. I ended up implementing a special, interactive debug feature that would allow me to use the mouse to point at a pixel in the viewing window and result in the rendering of that single view ray. I would use it by rendering the entire scene, setting a breakpoint, then clicking on the pixel that I needed to test. This was invaluable in finding the remaining artifacts as well as some ray tracing issues.
At this point, my KD Tree still proved to be more of a deceleration structure. I fired up a profiler, and found a number of slowdowns related to mallocs when operating on C++ vectors. I reduced my use of vectors and passed them by reference throughout the KD Tree traversal. This brought significant gains, but my KD Tree was still slower.
The next step was to implement a smarter space partitioning scheme based on comparing the Surface Area Heuristic of each new subspace. This cost calculation was also critical to determining when to make a leaf node. I had a few bugs that led to excessive node duplication. Once I sorted those out, I finally got the gains I was hoping for. My resulting tree for the more complicated scene was 11 nodes deep, and contained several empty leaf nodes.
16 Sphere Scene Without a KD Tree
Size: 500x500 Total Primitive Intersection Checks: 223084940 Total Node Traversals: 0 Total Render: 29.763917 seconds
16 Sphere Scene With KD Tree
Size: 500x500 Build KD Tree: 0.000580 seconds Total Primitive Intersection Checks: 55731412 Total Node Traversals: 161224507 Total Render: 24.878577 seconds
I reduced the number of sphere intersections from 225m to 161m while only adding 55m node traversals. The time savings wasn't as dramatic as I had hoped, likely because my algorithm still used functional recursion instead of maintaining a smaller stack inside of a loop. For my final project, I'm likely going to go stackless altogether since I'll be using the GPU too.
However, I really started to notice gains when I upped the complexity of the scene. I created a model of 140 reflective spheres arranged into a tightly-packed pyramid.
140 Sphere Scene Without KD Tree
Size: 500x500 Total Primitive Intersection Checks: 1016735174 Total Node Traversals: 0 Total Render: 103.413575 seconds
140 Sphere Scene With KD Tree
Size: 500x500 Build KD Tree: 0.091852 seconds Total Primitive Intersection Checks: 230486448 Total Node Traversals: 161218665 Total Render: 38.800175 seconds
I haven't posted about my Enertia for a while. At first, I feared that the novelty had worn off. I really haven't been riding it much...until this last weekend. And with that fresh seat time, my enthusiasm for the Enertia picked right back up where it left off. Coincidently, I passed the 2500 mile mark too.
Extra Leg on the Commute
A few changes in my circumstances have led to my lessened use of the Enertia. Firstly, I'm commuting from the office to school two days a week. Parking on campus is a nightmare. You basically have to park in a commuter lot and hop a bus in.
However...when I ride a motorcycle in, I can park right next to my building. This is exactly the time savings I was looking for to reduce my time away from the office, so I've been happily riding a motorcycle on those days. Unfortunately I haven't found a place to charge the Enertia on campus. Furthermore, in the spirit of saving time, I take the interstate. All of this means that I ride my V-Strom gasser instead of the Enertia.
I've reduced my Enertia riding on the weekend too, which is a shame, because the Enertia is perfect for running errands around home. I've got a roommate now, and we do a lot of things together. Unfortunately there's no room on the Enertia for a passenger.
My changed circumstances have highlighted the Enertia's range and capacity issues and affected the utility of the Enertia somewhat. At the same time though, riding my bulky, stinky, and loud V-Strom have made me appreciate the Enertia even more. It's a bit of a conundrum.
Enter the Empulse. This bike
could will directly solve two of my three problems. I should easily be able to commute on this, even on days when I'm on campus too. Even though I can't easily charge on campus, the extended range will mean that I likely won't need to. Furthermore the liquid cooled motor means that I'll be able to sustain highway speeds on the Interstate and avoid taking a circuitous route at lower speeds. This will prove to be a huge time saver.
Unfortunately, there still isn't room for a passenger. But riding two-up is for old folks anyway...except for the time I took too laps at Jennings GP with Jason Pridmore. We definitely didn't lap like old folks. Although I nearly lost control of my bowels like a grandpa.
Download Project Source/Scenes/Mac-Binaries: Program2.tar.gz
I've read about ray tracing a few times in the past, but this assignment gave me a dramatically new perspective on the topic. Two things really struck me about ray tracing. First, what I understood to be ray tracing was actually just ray casting. I didn't know this while I was implementing the diffuse shaders (pure Lambertian, Blinn-Phong, and Blinn-Phong with Ambience), and so I was rather impressed with the results. However, as soon as I implemented specular reflection via recursion, I started to realize that ray tracing is indeed a much more significant step over ray casting in terms of realism.
My second dramatic realization was just how expensive ray tracing is. Every feature that I added would drive my render times up. And this was compounded by framebuffer resolution, sampling grid size, number of lights, and number of scene primitives. I found myself switching between implementing new features and then going back and implementing various optimizations just to make the render times tolerable.
For part one of this ray tracer program, I used the COMP 575 assignment as a guideline on features to add beyond the minimum in the COMP 770 assignment. I kept going, adding feature after feature, unaware of whether or not these "extra features" would actually be required for the second part of the assignment or not.
- Resizable View Rect Dimensions
- Fully Configurable Camera (position, rotate, FOV)
- Multiple, Colored Light Support
- Configurable Background Color
- Output to PNG
- Supports both Ray Casting and Ray Tracing
- Specular Reflection
- Dielectric Reflection and Transmission w/ Refraction
- Blinn-Phong with Ambient
- Configurable Sample Count for Regular, Jittered, and Adaptive
- Multi-Processing Support with OpenMP
- Simplified Scene Intersection Calculations with Normalized Direction Vectors
- Tracks recursive contribution of color calculations for early recursion termination
- Adaptive [Jittered] Sampling
- Timing Instrumentation
- Can build without OpenGL, OpenMP, and libpng for benchmarking on embedded systems
Building and Usage
I've provided a Makefile with the following targets. It has been tested on Mac OSX, Linux (Ubuntu) and Android for ARM.
NOTICE: By default, I build on a system with OpenGL, OpenMP (libgomp), and libpng. If you don't have those on your system, then use the
NO_PNG=1 settings when running make.
make- Builds release.
make debug- Builds debug.
make clean- Cleans src directory and removes objdir directories.
make NO_OMP=1- Won't attempt to compile using OpenMP. Handy if the system doesn't have support. Can be used with other NO_*.
make NO_PNG=1- Won't attempt to compile using libpng. Handy if the system doesn't have support. Can be used with other NO_*.
make NO_GL=1- Won't attempt to compile using OpenGL. Handy if the system doesn't have support. Can be used with other NO_*.
Usage: raytracer [-shader
] [-sampling ] [-samples ] [-background <0xRRGGBBAA>] [-window ] [-timing] [-noparallel] [-norecursion] [-nodisplay] [-output ] -shader Sets the shader used. Each one builds upon the previous shader. Default = reflective -sampling Chooses which sampling method to use. Default = adaptive -samples Specifies n x n grid of samples to collect. Ignored for basic sampling. Default = 5 -background <0xRRGGBBAA> Sets the background color. Default = 0x000000ff -window Sets the window size to the specified width and height. Default = 500 x 500 -timing This switch turns on timing output on the console. Default = off -noparallel This switch turns off multiprocessing. Default = on -norecursion This switch turns off recursive ray tracing resulting in simple ray casting. Default = on -nodisplay This switch turns the output to the display. Default = on -output This switch causes the ray tracer to output a png image of the renderer scene.
Once I started working on support for dielectrics, I wanted to create a reasonably familiar scene so that I could interpret the results better. I placed a large, mostly transparent, smoke-gray sphere in front of the camera. The ground sphere is still somewhat reflective, but the two colored spheres in the background are non-reflective. When viewing this scene, it's best to use a white background (
-background 0xffffffff) in order to see the distortion at the perimeter of the sphere.
<scene> <!-- camera at (0,2,-12) pointed towards the origin --> <camera x="0.0" y="2.0" z="-8.0" fov="90.0" lookAtX="0.0" lookAtY="2.0" lookAtZ="0.0" upX="0.0" upY="1.0" upZ="0.0"/> <!-- smoked sphere --> <sphere radius="2.0" x="0.0" y="1.75" z="-3.0"> <color r="0.3" g="0.3" b="0.3" a="0.3"/> <material reflectance="0.0" refraction="1.5" phongExponent="0.0"/> </sphere> <!-- blue sphere --> <sphere radius="1.25" x="-4.0" y="2.0" z="2.0"> <color r="0.0" g="0.0" b="1.0" a="1.0"/> <material reflectance="0.0" refraction="1.0" phongExponent="16.0"/> </sphere> <!-- green sphere --> <sphere radius="1.25" x="4.0" y="2.0" z="2.0"> <color r="0.0" g="1.0" b="0.0" a="1.0"/> <material reflectance="0.0" refraction="1.0" phongExponent="16.0"/> </sphere> <!-- white overhead light --> <light x="0.0" y="5.0" z="0.0" ambient="0.25"> <color r="1.0" g="1.0" b="1.0" a="1.0"/> </light> <!-- "ground" sphere --> <sphere radius="50.0" x="0.0" y="-50.0" z="0.0"> <color r="0.75" g="0.75" b="0.75" a="1.0"/> <material reflectance="0.3" refraction="1.5" phongExponent="32.0"/> </sphere> </scene>
Of the various optimizations that I implemented, none provided the immediate results that the parallelism through OpenMP. I was definitely embarrassed that I didn't think of it before the COMP 575 professor mentioned it. I fully anticipated that I would have to refactor my code to be multi-threaded. I was pleasantly surprised to find OpenMP. I had used a similar compiler extension on some Cell Processor development years ago, but OpenMP is much further along in terms of ease-of-use and compiler support. I was so thrilled when I discovered it, that I blogged about it here. With one line in my Makefile and two lines of code, I gained a nearly 75% increase.
- Viewing Rect
- Floating Point Error When Intersecting Light Ray
- Transparency + Ray Casting = Does Not Compute
- Floating Point Error Part Deux
Overall, development went really smooth on this project. I was definitely making a lot of hand-gestures while trying to visualize where my cross products would be aiming as I was trying to generate the viewing rect. I didn't know how to correlate the
up vector of the camera with the vector from the camera position to the
lookAt point, especially when they weren't perpendicular. Finally I decided to use the up vector and assume that the camera was looking straight forward, along it's z axis. Without this assumption, I felt like I would have to be dealing with an oblique projection, which I wasn't ready for.
I struggled a little when I was trying to calculate intersections with the scene for the light vector from the visible point back to the light source. Initially I tried to throw out intersections with the primitive that the visible point was on. Of course, that didn't yield results, so I finally settled on throwing out all intersections that were closer to the visible point than a particular threshold, lambda. The next lecture, the same strategy was mentioned as a solution to that problem.
The next significant problem that I phased was how to deal with transparency. Again, I was unlucky enough to be a little early to implement this feature. Two lectures later, we discussed ray tracing vs. ray casting. Recursive ray tracing, makes reflection and transmission with refraction nearly trivial. For a while, I was a little confused between specular reflection and dielectric reflection, but I finally differentiated the two and accepted the fact that an object can be a dielectric and also have specular reflective material properties. The last part of ray tracing that was really challenging, was the calculation of the "a" constant when determining the filtering of light when it is transmitted through a dielectric. The textbook described how the Beer-Lambert Law determined how much light is transmitted, but they said that a constant for each color channel is chosen and that the natural logs from the formula are rolled into that. Furthermore, they mentioned that developer's often tune this parameter by eye. I settled on a calculation for "a" that took each color channel of the intersected primitive and multiplied it with (1 - alpha) for that color. Visually, I found the results to be satisfactory.
The last hurdle that I faced was again related to detecting intersections that are too close to the visible point. This time, the rays in question were the transmitted/refracted rays. I was still using the threshold from before to eliminate intersections that were too close to the ray origin. However, the threshold value that I was using was very small. I found that several of the refraction calculations involved a lot of floating point math errors that had built up through the multiple calculations and amplified by the recursion. I just relaxed the threshold and the noise was eliminated.
Like many people, I find myself doing more of my shopping on Amazon and other e-tailers. I just can't justify the brick-and-mortar tax at my local big-box retailers. Besides, who has time to go shopping anyhow?
Am I just being lazy? More importantly, what's the environmental impact of my laziness?
When goods are shipped to big-box retailers, their retail packages are boxed up on pallets. Hopefully they box them up better than this example. They are generally handled with care, so this seemingly crude boxing isn't usually a problem. However, when that same retail packages is shipped to you through an e-tailer, it's repackaged into a larger box filled with packing material. Hopefully you have a an option to recycle that corrugated cardboard and packing material, but regardless, it's still rather wasteful.
Lots of Extra Fuel
These pallets are loaded onto trucks, freight trains, and cargo ships with finite storage volume. Therefore, it's advantageous that the pallets are packed densely. They're transported to your local big-box, at which point you typically make small trips to pick up your favorite items.
In the online space, e-tailers receive these same pallets at their distributions centers. There are far few distribution centers, so for the final leg on board semi-trucks, this saves on fuel. However, that's where it ends. The e-tailer must open the pallets and put the individual retail packages into their bigger boxes and those big boxes are picked up by your favorite carrier services.
Each package is bundled tightly into something similar to a pallet, but at this point they are taking up way more space. More space means more trips by the carrier. Furthermore, a lot of these trips have long portions by air since we just can't seem to wait a few extra days.
Study Confirms My Intuition
I'm really starting to thing that it's more efficient for you to shop at a big-box. And it looks like I'm not the only one. This study seems to confirm my intuition.
Shouldn't Shipping And Handling Cost More
Unfortunately, shipping and handling is way to cheap. Especially with services like Amazon Prime. These carriers are awfully efficient, but in a sense they're being subsidized in the form of cheap fuel. I'm not saying that they pay less for fuel (although they likely do because they buy it in bulk). I'm actually saying that fuel in America is way to cheap for everyone. If it was more appropriately priced to account for the environmental damage of fossil fuels, then clearly e-shopping would be more expensive and we would all get our lazy selves to our favorite big-box.
So I've been working on this Ray Tracer for class. It's rather primitive, but it's starting to slow to a crawl as I add features (especially multi-sampling). I'm still only rendering three spheres, but a full screen render at 1920x1066 was taking 13+ seconds. The professor gave us a few tips today on speeding up the algorithms involved. I felt pretty dumb for not thinking of the most obvious recommendation that he made. And wouldn't you know it, it turned out stunning results.
OpenMP - Multi-Processing Made Real Easy
Of course, Ray Tracing is ridiculously parallel. You have to do the same calculations on every ray. That's at least one ray per pixel in your viewing plane. Ten minutes of searching the internet for multi-processing methodologies in C/C++ delivered me to the OpenMP Wikipedia Page. It's a very friendly introduction. I've used something lightly similar years ago for IBM's Cell processor, but it wasn't nearly as easy as OpenMP.
I doubted that my gcc compiler on my MacBook Pro supported OpenMP, but digging through the OpenMP compiler page revealed that there has been OpenMP 2.5 support since gcc 4.2 and I'm running gcc 4.2.1.
There's a note on the same OpenMP compiler page that says "Compile with -fopenmp". At first I thought that I'd have to rebuild gcc with that flag. Dummy. All I need to do was to use that feature flag when compiling with my version of gcc.
So here it is. My three lines of code.
-fopenmp to Makefile
Including omp.h in RayTracer.cpp
#pragma above my outermost for loop
#pragma omp parallel for
- Size: 1920x1066 - Render time: 13.178630 seconds
- Size: 1920x1066 - Render time: 7.544398 seconds
Professionally, I work on a performance and optimization team in the embedded space. Usually when I see such gains with just a handful of lines of code, it's because I finally remembered that I was building DEBUG.