h1

Hadoop MapReduce – write output to multiple directories depending on the reduce key.

July 9, 2014
  If you have to write data into different files 
  depending on the reducer key, you have to use 
  org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
 
1. Usage pattern for job submission:
Job job = new Job();
LazyOutputFormat.setOutputFormatClass(
                                 job,
                                 TextOutputFormat.class);
// if your data have to be compressed
TextOutputFormat.setCompressOutput(job, true);
TextOutputFormat.setOutputCompressorClass(job, 
                                         GzipCodec.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileOutputFormat.setOutputPath(job,outDir); 

2. In your reducer:
 
public class YourReducer extends 
              Reducer<Text, Text, NullWritable, Text> {
private MultipleOutputs<NullWritable,Text> mos;
     @Override
     protected void setup(Context context) 
                    throws IOException, InterruptedException {
       long start = System.currentTimeMillis();
       super.setup(context);
       mos = new MultipleOutputs<NullWritable,Text>
                                (context);
     }
    @Override
    protected void cleanup(Context context) 
                   throws IOException, InterruptedException {
        super.cleanup(context);
        mos.close();
    }
    @Override
    protected void reduce(Text key,Iterable<Text> value,Context context) 
                               throws IOException, InterruptedException {
         String fileName = generateFileName(key);
         for (Text val : value) {
           mos.write(NullWritable.get(), val, fileName);
         }
    // your method to generate file name based on the key
    private String generateFileName(Text key) {
         return key.toString()+Constants.FILE_NAME_PREFIX;
    }

 


h1

Notes from the trenches.

February 8, 2011

I have built an JNLP (Java Web Start) application to FTP files to different remote servers. Here are the problems I have encountered while building and launching my application.

  • JNLP Security.

Applications launched with Java Web Start will, by default, run in a restricted environment with limited access to files and network.To override this default behavior you have to sign JAR files. A signed application can request additional system privileges, such as access to a local disk. You do not need to sign all your jars, just the ones which operate outside your restricted environment (‘sandbox’). While signing JNLP application, you might encounter the following error:’jarsigner: unable to sign jar:…’.

In this case you can use ‘brute force’ – unjar jars which you want to sign, remove manifest file from these jars and jar them again. You will be able to sign these jars with your signature.

  • edtFTPj/PRO in your JNLP application

edtFTPj/PRO requires license information, which is supplied as either two strings (an owner string and license key), or as a license.jar file (for trial editions and for customers prior to 2.0.1). When you start your JNLP application, you might have ‘license.jar not found’ exception. This exception is misleading, it is thrown even if you have non-trial edition of edtFTPj/PRO.
You have to include license information in your main class:

    com.enterprisedt.util.liccense.License.setLicenseDetails
    (“yourownername”, “yourlicensekey”);

  • JNLP configuration:

In order to launch your JNLP application from jsp you have to include the following configuration in your jsp page:

    <?xml version=”1.0″ encoding=”utf-8″?>
    <jnlp spec=”1.5+” codebase=”<%=jnlpCodeBase %>”>
    <information>
    <title>Uploader Application</title>
    <vendor></vendor>
    <homepage href=”yourhomepage.otp”/>
    <description>Uploader Application</description>
    <offline-allowed/>
    </information>
    <security>
    <all-permissions/>
    </security>
    <resources>
    <j2se version=”1.5+” java-vm-args=”-Xnoclassgc”/>
    <!– your jnlp application jar –>
    <jar href=”<%=jnlpCodeBase%>/approot/app.jar” download=”eager” />
    <jar href=”<%=jnlpCodeBase%>/lib/edtftpj-pro.jar” download=”eager”/>
    <jar href=”<%=jnlpCodeBase%>/lib/JSAP-2.1.jar” download=”eager”/>
    <jar href=”<%=jnlpCodeBase%>/lib/swing-layout-1.0.jar” download=”eager”/>
    <jar href=”<%=jnlpCodeBase%>/lib/swing-worker.jar” download=”eager”/>
    </resources>
    <application-desc main-class=”yourpackage.Main”>
    <argument>–server</argument>
    <argument> <%=server %></argument>
    <argument>–user</argument>
    <argument&gt <%=userid %></argument>

    </application-desc>
    </jnlp>

Please note security section, it will allow your JNLP application to have full access to a client system if all its JAR files are signed.

h1

AspectJ with Spring applications.

May 21, 2010

Tracing and auditing is one of the most common ways to monitor an application.
If your application is Spring-based, you can use Spring AOP for auditing and monitoring(via interceptors). But what if you want to add custom annotations to each audited method and have an access to these custom annotation at run time? In order to do that you need Load Time Weaving (LTW) with AspectJ in the Spring framework.
Here is how to add AspectJ auditing to your Spring-based web application.
Note : In order to use ASPECTJ LTW with Spring based web application, you have to use release Spring 2.5.0RC2 or later.      

    1. Create custom annotation:        
package com.audit.annotations;
import java.lang.annotation.ElementType;  
import java.lang.annotation.Retention;    
import java.lang.annotation.RetentionPolicy;    
import java.lang.annotation.Target;    
 @Target(value=ElementType.METHOD)    
 @Retention(value=RetentionPolicy.RUNTIME)    
public @interface AppAudit {    
            public String stepName();    
            public String stepDescription(); 
}   

 

    2. Create an Aspect, which will include around advice and pointcut. 

 

package com.audit.aspects;
@Aspect
public class AuditAspect {

    @Around("methodsToBeProfiled(appAudit)")
    public Object profile(ProceedingJoinPoint pjp, AppAudit appAudit)
            throws Throwable {
            ….. //<-- you can access your annotation here
            return pjp.proceed();
    }

     @Pointcut(
         "execution(@com.audit.annotations.AppAudit* *(..)) &&
                        @annotation(appAudit)")
      public void methodsToBeProfiled(AppAudit appAudit){}

}

 

Please note, that ‘methodsToBeProfiled’ pointcut uses custom annotation AppAudit for selection. It selects all join points (methods) which are annotated with AppAudit.            

The following is an example of a join point method, which will be selected by this pointcut:            

@AppAudit(stepName="step name”, stepDescription="step description”)
public List getProjects(Long countryId) {
                      .......
 }

 

    3. Create weaverContext.xml config file and save it in your applications WEB-INF directory: 
<?xml version="1.0" encoding="UTF-8"?>
<beans
xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
      http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.springframework.org/schema/aop
     http://www.springframework.org/schema/aop/spring-aop-2.5.xsd
http://www.springframework.org/schema/context
     http://www.springframework.org/schema/context/spring-context-2.5.xsd"
>  

<context:load-time-weaver/>  

</beans>  
    4. Add weaverContext.xml to contextConfigLocation in web.xml:

 

<context-param>
<param-name>contextConfigLocation</param-name>
<param-value>
/WEB-INF/application-context.xml <-- your application context
/WEB-INF/weaverContext.xml
</param-value>
</context-param>

  

    5. Modify your application context file ( in our example application-context.xml) by adding AspectJ related context:

 

<?xml version="1.0" encoding="UTF-8"?>
<beans
xmlns="http://www.springframework.org/schema/beans"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-2.5.xsd">
…..   

<bean class=
"org.springframework.aop.aspectj.annotation.
                                    AnnotationAwareAspectJAutoProxyCreator"/>
<bean class=
"org.springframework.beans.factory.annotation.
                                       RequiredAnnotationBeanPostProcessor"/>
<!-- AspectJ related context -->
<context:spring-configured/>
<context:annotation-config/>
<aop:aspectj-autoproxy/>
<bean id="auditAspect" class="com.audit.aspects.AuditAspect"/>
……
</beans>

  

    6. Prepare Tomcat to use AspectJ LTW:

    6.1 Create aop.xml file and copy it to your application META-INF directory: 

          

<!DOCTYPE aspectj PUBLIC
“-//AspectJ//DTD//EN” “http://www.eclipse.org/aspectj/dtd/aspectj.dtd”>  

<aspectj>
<weaver options=”-showWeaveInfo
-XmessageHandlerClass:org.springframework.aop.aspectj.
AspectJWeaverMessageHandler”>
<!– only weave classes in our application-specific packages –>
<!–package which will be audited–>
<include within=”com.myorg.myapp*”/>
</weaver>  

<aspects>
<!– weave in just this aspect –>
<aspect name=
“com.audit.aspects.AuditAspect”/><–include your aspect
</aspects>
</aspectj>  

    6.2 Create context.xml file and copy it to your application META-INF directory: 

         

<Context path=”/mywebapp” reloadable=”false”> <– your application root
<Loader loaderClass=
“org.springframework.instrument.classloading.tomcat.
TomcatInstrumentableClassLoader”
useSystemClassLoaderAsParent=”false”/>
</Context>  

    6.3 Add spring-tomcat-weaver.jar to $CATALINA_HOME/lib directory.

         


Tomcat directory structure:              

$CATALINA_HOME/lib/
             |-annotations-api.jar
             |-spring-tomcat-weaver.jar
             |-.......
$CATALINA_HOME/webapps/
            |-mywebapp <-- your web application
                     |-META-INF/
                         |-aop.xml
                         |-context.xml
                      |-WEB-INF/
                         |-weaverContext.xml
                         |-.........
                         |-lib/
                               |-aspectjrt.jar
                               |-aspectjweaver.jar
                               |-spring-aop.jar
                               |-spring-aspects.jar
                               |-........

h1

Custom ThreadPoolExecutor

February 4, 2010

If your application is multithreaded and you use Java 6.0, most likely you will use ThreadPoolExecutor. The ThreadPoolExecutor class is a concrete implementation of the ExecutorService interface, which should be sufficient for most applications. It also provides useful hookup methods (like beforeExecute, afterExecute, etc) which can be overridden for customization purposes.

Here are the signatures:

protected void beforeExecute(Thread t, Runnable r);

protected void afterExecute(Runnable r, Throwable t);

As you can see, one of the parameters is Runnable. But what if you want to submit a Callable to your custom ThreadPoolExecutor?

Simply override these methods:

@Override
public void afterExecute(Runnable r, Throwable t) {
super.afterExecute(r, t);

if  (t==null && r instanceof  Future<?>) {

try {

String retValue =  (Future<?>)r.get(); // if your Callable returns a String

// add your code here

} catch (InterruptedException ie) { // handle it here}

catch (ExecutionException ee) { // handle it here}

catch(CancellationException ce) {// handle it here}

catch(Exception ex) { //handle it here}

}

You can inject an Observer into your custom ThreadPoolExecutor and either return retValue to the observer or notify it about any exceptions thrown. Make sure your observer is thread safe, especially methods you will invoke from your custom ThreadPoolExecutor. Happy threading!!!