OneAnalytix is seeking data engineer/data scientist to build big data analysis platform. OneAnalytix simplifies the deployment of state-of-the-art advanced analytics to allow clients to gain a deep understanding of their customers and predict what product or service they want next. Serving major banks and retailers, OneAnalytix provides a complete platform that supports real-time data integration and storage, advanced analytics algorithm generation, campaign management, and visualization.
Data Engineer
Data Engineering is more focused on the systems that store and retrieve data. A data engineer
will be responsible for building and deploying storage systems that can adequately handle the
needs. Sometimes the needs are fast real-time incoming data streams. Still other times the
needs are many many reads of the data.
The main goal of data engineer is to make sure the data is properly stored and available to the
data scientist and others that need access. A data engineer would typically have stronger
software engineering and programming skills than a data scientist.
Summary
The ideal candidate should have experience in the following:
- Installation, configuration, validation, testing and administration of a Hadoop cluster
- Implementing ETL process.
- Familiarity with the configuration parameters and settings in order to handle/debug real
time deployment issues.
Skills and Experience
- Proficiency with Hadoop v2, HDFS and the Hadoop Ecosystem (Zookeeper, Yarn)
- Experienced in programming languages Python, Java, SQL. Familiar with Linux system,
shell scripting.
- Experience with MapReduce, Spark.
- Tune data for optimal query performance
- Query to aggregate multiple rows of data, calculate statistics of data, filter data,
sort data, join datasets, create indexing and tables.
- Experience with NoSQL databases, such as HBase, MongoDB.
- Proficient understanding of distributed computing principles.
- Knowledge of machine learning toolkits is desired.