<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paradise Studios</title>
	<atom:link href="http://www.paradise-studios.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paradise-studios.net</link>
	<description>Another reality is possible</description>
	<lastBuildDate>Fri, 06 Apr 2012 09:40:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Efficient 2D DFT implementation in micro-threaded environments</title>
		<link>http://www.paradise-studios.net/2010/03/efficient-2d-dft-implementation-in-micro-threaded-environments/</link>
		<comments>http://www.paradise-studios.net/2010/03/efficient-2d-dft-implementation-in-micro-threaded-environments/#comments</comments>
		<pubDate>Thu, 04 Mar 2010 12:22:13 +0000</pubDate>
		<dc:creator>Xavyiy</dc:creator>
				<category><![CDATA[Sin categoría]]></category>

		<guid isPermaLink="false">http://www.paradise-studios.net/?p=29</guid>
		<description><![CDATA[Hi all from Madrid! These last days I&#8217;ve been working on an application for testing the PThreading &#8211; Paradise Threading &#8211; library, the core of Paradise Engine and all our tools,  in different systems by just downloading/executing a little application &#8211; which will be soon available for download -. But&#8230; what to test? At really&#8230; [...]]]></description>
			<content:encoded><![CDATA[<p>Hi all from Madrid!</p>
<p>These last days I&#8217;ve been working on an application for testing the PThreading &#8211; <span style="color: #999999;">Paradise Threading</span> &#8211; library, the core of Paradise Engine and all our tools,  in different systems by just downloading/executing a little application &#8211; <span style="color: #999999;">which will be soon available for download</span> -. But&#8230; what to test? At really&#8230; it could seem to be something easy to decide, but I can swore that it isn&#8217;t! It must to be something parallelizable and useful for us in a future, so&#8230; what better than a complete, efficient and smart 2D FFT/DFT &#8211; <span style="color: #999999;">and inverses</span> &#8211; implementation? <img src='http://www.paradise-studios.net/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Good, so&#8230; let&#8217;s go. In this post I&#8217;ll talk about the DFT implementation, not the FFT this time, but almost all &#8211; <span style="color: #999999;">at really, all</span> &#8211; results can be extrapolated to the FFT case, just taking account that the FFT is orders of magnitude faster than the DFT calculation.</p>
<p>In a practical implementation of the two dimensional DFT , we explote the fact that the 2D DFT of a MxN matrix could be calculated by performing the one dimensional DFT of the M rows, and then the 1D DFT of the N columns. For more information visit <a href="http://local.wasp.uwa.edu.au/~pbourke/miscellaneous/dft/">this</a> and <a href="http://www.dca.fee.unicamp.br/DIPcourse/html-dip/c5/s2/front-page.html">this</a> links.</p>
<p>Well, now that I&#8217;ve written a &#8211; <span style="color: #999999;">very </span>- little 2D DFT introduction, it&#8217;s time to talk about the practical implementation. In this post I&#8217;ll talk about two methods, from now labeled as <strong>A</strong> and <strong>B</strong>.<br />
In this post I&#8217;ll assume that the MxN matrix data type is stored in a simple array of size M*N where each A(m,n) element one-dimensional index is given by the formula: Index = m*N+n, so rows elements are stored in contiguous memory addresses and the last elements of the K row is contiguous to the first elements of the K+1 row.<span id="more-29"></span></p>
<h3>A: row-column decomposition:</h3>
<p>The row-column 2D DFT decomposition consists on calculating, in a first step, the 1D DFT of all rows and then, in a second step, the 1D DFT of all columns of the first step results.</p>
<p>In an ideal situation, this approach could be considerated the faster method to perform a 2D DFT, but since computers aren&#8217;t perfect the fact is that even if this method is considerably faster than calculating the 2D DFT by the definition, it&#8217;s not the faster one.</p>
<p>Why? Simply, because of the acces to contiguous memory addresses is considerably faster than random &#8211; <span style="color: #999999;">or further, in this case</span> &#8211; addresses location acces.<br />
So, in this approach, the DFT calculation of each row is efficient, but the DFT calculation of each column is not efficient since only row elements are located in contiguous memory. For more information visit <a href="http://www.scopus.com/record/display.url?eid=2-s2.0-0018782562&amp;origin=inward&amp;txGid=2WMJKwAuD9Z9CbRDnOT1v3W%3a1">this link</a>.</p>
<h3>B: row-transpose-row-transpose decomposition:</h3>
<p>Okey, so&#8230; one time I&#8217;ve talked about the row-column decomposition method it&#8217;s time to talk about a more efficient way.<br />
The idea &#8211; <span style="color: #999999;">and, at start, the implementation</span> &#8211; is simple: calculating the 1D DFT of all rows, transpose the matrix, re-calculate the 1D DFT of all rows(that at really are our columns) and finally transpose the result matrix to obtain our 2D DFT.</p>
<p>So, well, but&#8230; what&#8217;s the most efficient(faster) implementation of this method in a micro-threading environment?<br />
After a lot of tests and implementation variants, the way I get the best performance is:</p>
<ol>
<li>Create two temporal MxN matrices(arrays of M*N size) which will be used in the following parallelizable tasks.</li>
<li>Perform the 1D DFT in packages of K rows in parallel &#8211; <span style="color: #999999;">in different &#8216;tasks&#8217;</span> -, using one of the temporal matrices to store calculated values and finally, in the same task, copy the K calculated rows to the K correspondent columns in the other temporal matrix &#8211; <span style="color: #999999;">which is the &#8216;transposed&#8217; matrix</span> &#8211; <span style="color: #99cc00;">So now, one time all tasks have been performed, we have the transposed matrix of the 1D DFT of the original matrix rows stored in one of our temporal matrices, calculated in a micro-threaded smart way</span> -</li>
<li>Perform the 1D DFT in parallelizable packages of L rows &#8211; <span style="color: #999999;">as we&#8217;ve done in the second point</span> &#8211; of the temporal matrix which stores the transposed results, using the other temporal matrix to store calculated values and finally, in the same task, copy the L calculated rows to the L correspondent columns in the original matrix.</li>
</ol>
<p><strong>Some considerations:</strong></p>
<ul>
<li>The purposed method is fully parallelizable, no locks are needed since the way the shared data is used- <span style="color: #999999;">the 3 shared matrices</span> &#8211; is thread safe.</li>
<li>K and L define the grain size. &#8211; <span style="color: #999999;">Number of 1D DFT executed per task </span>-</li>
<li>DFT and Inverse DFT &#8211; <span style="color: #999999;">IDFT </span>- are almost equivalent.</li>
</ul>
<h3><strong>And for finishing, some numbers</strong>:</h3>
<p>Here is the DFT execution results of a 512&#215;512, 4 bytes per channel (float data type), RGB image using the two proposed methods:</p>
<table style="border: 1px solid #cccccc; background-color: #444455; height: 306px;" border="0" cellspacing="0" cellpadding="7" width="100%">
<tbody>
<tr>
<th style="text-align: left;" colspan="3" scope="col">Single-threaded DFT execution times</th>
<th colspan="3" scope="col">Micro-threaded DFT execution times</th>
</tr>
<tr>
<td style="border-bottom: 1px solid #CCC;">Method</td>
<td style="border-bottom: 1px solid #CCC;">Stage</td>
<td style="border-bottom: 1px solid #CCC;">Time (seconds)</td>
<td style="border-bottom: 1px solid #CCC;">Method</td>
<td style="border-bottom: 1px solid #CCC;">Stage</td>
<td style="border-bottom: 1px solid #CCC;">Time (seconds)</td>
</tr>
<tr>
<td>A</td>
<td>Rows</td>
<td>12.50 sec</td>
<td>A</td>
<td>Rows</td>
<td>6.69 sec</td>
</tr>
<tr>
<td></td>
<td style="border-bottom: 1px solid #CCC;">Columns</td>
<td style="border-bottom: 1px solid #CCC;">13.90 sec</td>
<td></td>
<td style="border-bottom: 1px solid #CCC;">Columns</td>
<td style="border-bottom: 1px solid #CCC;">7.54 sec</td>
</tr>
<tr>
<td style="border-bottom: 1px solid #CCC;"></td>
<td style="border-bottom: 1px solid #CCC;">Total</td>
<td style="border-bottom: 1px solid #CCC;">26.40 sec</td>
<td style="border-bottom: 1px solid #CCC;"></td>
<td style="border-bottom: 1px solid #CCC;">Total</td>
<td style="border-bottom: 1px solid #CCC;">14.23 sec</td>
</tr>
<tr>
<td>B</td>
<td>1st rows</td>
<td>12.48 sec</td>
<td>B</td>
<td>1st rows</td>
<td>6.70 sec</td>
</tr>
<tr>
<td></td>
<td style="border-bottom: 1px solid #CCC;">2nd rows</td>
<td style="border-bottom: 1px solid #CCC;">12.39 sec</td>
<td></td>
<td style="border-bottom: 1px solid #CCC;">2nd rows</td>
<td style="border-bottom: 1px solid #CCC;">6.64 sec</td>
</tr>
<tr>
<td></td>
<td>Total</td>
<td>24.87 sec</td>
<td></td>
<td>Total</td>
<td>13.34 sec</td>
</tr>
<tr>
<td></td>
<td style="color: #f30;">Diff(A-B)</td>
<td style="color: #f30;">1.53 sec</td>
<td></td>
<td style="color: #f30;">Diff(A-B)</td>
<td style="color: #f30;">0.89 sec</td>
</tr>
</tbody>
</table>
<h3>About results</h3>
<ul>
<li>Micro-threading results are just orientative since tests have been performed on an -<span style="color: #999999;">old</span>- Intel Core Duo laptop CPU.</li>
<li>Method A: rows DFT are considerably faster than columns DFT, as expected.</li>
<li>Method B: first and second step times are almost equivalent, as expected &#8211; <span style="color: #999999;">ideally, they should be the same</span> -</li>
<li>Method B is faster than method A, as expected.</li>
<li>All results are from the average times of 5 executions, no modifications have been done -<span style="color: #999999;"> just rounded to two decimal numbers</span> -.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.paradise-studios.net/2010/03/efficient-2d-dft-implementation-in-micro-threaded-environments/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Welcome</title>
		<link>http://www.paradise-studios.net/2010/03/welcome/</link>
		<comments>http://www.paradise-studios.net/2010/03/welcome/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 16:59:01 +0000</pubDate>
		<dc:creator>Xavyiy</dc:creator>
				<category><![CDATA[Sin categoría]]></category>

		<guid isPermaLink="false">http://www.paradise-studios.net/?p=14</guid>
		<description><![CDATA[Hi all and welcome to the Paradise Studios development blog! Just to make a little informal presentation in this first blog entry, the Paradise Studios team consists on 3 guys: one 3D artist, Marcus Feital (despadas) who is a passionate 3D artist (and Sound Designer) and had worked on a recent commercial title; an excellent [...]]]></description>
			<content:encoded><![CDATA[<p>Hi all and welcome to the Paradise Studios development blog!</p>
<p>Just to  make a little informal presentation in this first blog entry, the Paradise Studios team consists on 3 guys: one 3D artist, <strong><a href="http://www.despadas.com/">Marcus  Feital</a></strong> (despadas) who is a passionate 3D artist (and Sound Designer) and had worked on a recent commercial title; an excellent web designer and 2D artist, <strong><a href="http://www.emm-gfx.net">Josep  Viciana</a></strong> (emmgfx), who  has several years of experience in his field and works in a web  development company; and finally me, <strong>Xavier Verguín</strong> (Xavyiy), well, it&#8217;s hard for me to define myself, but I think of myself as someone who loves programming, specially 3D development, and, like my team partners, have been developing some well-know projects in my field these last years, like Hydrax and SkyX.</p>
<p>In this blog we&#8217;ll write about our current work in the Paradise Studios projects and miscellaneous development-related topics. We hope to write interesting stuff for developers and curious people, something that we would like to read too. <img src='http://www.paradise-studios.net/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>From Paradise Studios we want to announce that we&#8217;ll contribute with some Open Source projects, like Ogre3D, releasing as Open Source new and improved Hydrax and SkyX versions, and maybe some other plug-ins in the future.</p>
<p>Thanks for visiting us,<br />
The  Paradise Studios team.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paradise-studios.net/2010/03/welcome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

