Data Scientist (NLP)
RemoteUnited States$120,000 - $145,000 per year
** This is a direct hire position for one of our clients. This position is fully remote but needs to be on EST or CST time zone. Candidates must be able to work in the US without sponsorship.**
We are seeking a Data Scientist to join our team. The ideal candidate will be responsible for integrating new data sources into our backend platform, as well as creating custom NLP and data science models from scratch using the integrated data sources. Additionally, the candidate should be comfortable building custom NLP and data science models based on large datasets in XML, as well as financial modeling based on structured data and hybrid models that consider both. The Senior Data Scientist will also be expected to create new data processing pipelines and various data science processes including data preparation, feature selection, training, classification, and deployment workflows.
As the Data Scientist, you should be able to provision your own environments in Sagemaker via Jupyter notebooks or EMR and use Pyspark, deep learning libraries, NLP libraries, Pandas, and other relevant tools. The ideal candidate should not rely solely on off-the-shelf ML libraries and should be comfortable with creativity and customization of models. Your solutions should consider scalability, with optimized usage of distributed computing frameworks like Spark. Additionally, you should have experience leveraging the AWS ecosystem to bring in relevant AWS tools, services, and resources for processing large datasets before runtime, entity resolution between large datasets, and real-time processing in a scalable, distributed computing environment.
- Integrate new data sources into the backend platform
- Create custom NLP and data science models from scratch using integrated data sources
- Build custom NLP and data science models based on large datasets in XML and structured data
- Create new data processing pipelines and various data science processes
- Prepare custom data science models for patent, financial, and people data
- Demonstrate a strong understanding of data and efficiently query and obtain data via SQL
- Assist the development team with processing and integrating data analysis
- Clearly document processes, methodologies, and tools used
- B.S. in a relevant technical degree
- At least 3-5 years of experience building custom data science models
- Experience with Natural Language Processing
- Experience with financial modeling is a plus
- At least 3-5 years of experience with writing complex SQL queries and analyzing data correlations
- At least 3-5 years of experience with the AWS ecosystem
- Project management skills, ability to scope out timelines, methodologies, and deliverables for development, testing, and integration into the platform
- Excellent communication and storytelling skills, both written and verbal
Our Vetting Process
At Emergent Software, we work hard to find the software engineers who are the right fit for our clients. Here are the steps of our vetting process for this position:
- Application (5 minutes)
- Online Assessment & Short Algorithm Challenge (40-60 minutes)
- Initial Phone Interview (30-45 minutes)
- 2 Interviews with the Client
- 1 Challenge and presentation in person
- Job Offer!
In this role, you will be responsible for creating custom data models from scratch.