Apache Flume Interceptors: Morphline

Although the main goal of Morphlines Solr is allow load of Flume events into Solr for indexing, use Morphlines interceptor gives the needed flexibility to parse, add, transforms, rename remove and other operations over data within the events without send them exclusively to Solr sink. Unfortunately use Morphlines Interceptors in […]

Apache Flume: Using Environment Variables

A very common practice in developing teams is use a SVC (Software Version Control) system like Subversion, Git, etc. We cannot live without them.   Suppose now that you have a Flume properties file which contain sensitive information like keys for access to your Amazon Web service Stream Commit files […]

Apache Flume: Chaining Agents in Tiers

Consider the next scenario: You have several Flume agent running in different nodes, all of those agent writes its event in HDFS: Due to HDFS limitation it cannot be more than one writer for some specific file at the same time, so it will be one file writing by agent. […]

Apache Flume: Spooldir and Netcat Sources

The idea of this post it’s start doing a proof of concept with a very basic agent which will listen for some events in a specific folder. Actually in the first example the agent will watch a local directory waiting for new files and just will log the content of […]

Flume Introduction: How it works? Sources, Channels and Sinks

Apache Flume is designed to ingest in a high-volume event-based data fashion. Although its final destination is frequently an HDFS storage, it can have a lot of different kind of destinations, sinks in Flume  jargon, like amazon Kinesis Stream, Apache Kafka and so on.   As a lot of big […]

WordPress: Powering by your text editor

WordPress comes with a very basic text editor that allows you write post in a simple way and this is fine… until you need more advanced features. (basic text editor functionality ) How about if you want to use a different font than Georgia, the default font in our posts? […]

Javascript: Getting mouse coordinates in javascript

Sometimes we need to know the current position of the mouse when it moved. One approach could be to use values properties window.event.clientX & window.event.clientY in the event onmousemove provided by DOM in the HTML. The code would be: However, this approach has a weakness; just works fine on Chrome and […]

HDFS: A brief introduction

What happen if you have to many data which cannot fit in one single machine? Is it possible distribute that data among several machines? The answer is yes! What you need is a distributed filesystem which is in charge to handle data across a network. In terms of distributed filesystems, […]

Create a Spark Application with Scala using Maven on IntelliJ

In this article we’ll create a Spark application with Scala language using Maven on Intellij IDE. I will show you too how to fix pom.xml generated from the archetype. So, let start. As prerequisites we must have installed the  following applications & plugins/ in our IDE: Spark Scala (Intellij plugin) […]

Understanding a Java environment: What are JDK & JRE?

JRE & JDK JRE and JDK could cause a bit of confusion when we take our first steps in Java’s World. Let me explain the first concept: the JRE. This is the Java Runtime Environment and it is made up JVM (Java Virtual Machine) which resides in our computer, main […]