RESPONSIBLITIES
Because we work on the cutting edge of a lot of technologies, we need someone who is a creative problem solver, resourceful in getting things done, and productive working independently or collaboratively. This person would also take on the following responsibilities:
1 Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.).
2 Work closely with our engineering team to integrate your amazing innovations and algorithms into our production systems.
3 Perform ETL for various datasets with complex data transformation logic in batches as well as in real-time
4 Build scalable search and indexing capability to navigate and retrieve data
5 Process unstructured data into a form suitable for analysis – and then do the analysis.
6 Support business decisions with ad hoc analysis as needed.
DESIRED SKILLS
1 Programming experience, ideally in Python or Scala, but we are open to other experience if you’re willing to learn the languages we use.
2 Hands-on Experience in data modeling and data model optimization
3 Deep knowledge and hands on experience in ETL into and from RDBMS, preferable with PostgreSQL and Oracle DB.
4 Experience with open-source ETL tools such as Pentahol and Talend is a plus
5 Proficient in writing SQL queries
6 Experience processing large amounts of structured and unstructured data. Spark and MapReduce experience is a plus.
7 Excellent programming knowledge to clean and scrub noisy datasets.
8 Deep knowledge in data mining, machine learning, natural language processing, or information retrieval is a plus
9 Strong knowledge of and experience with statistics; potentially other advanced math as well.
10 An excellent team player and communicator who can work effectively with cross functional teams.