The Data Science and Analytics COE is responsible for leading the creation and development of the overall strategy and direction of data science and advanced analytics at CDW – including ensuring continuity and seamless extension of existing programs, the development of a short- and long-term vision and roadmap, and defining and institutionalizing the role that data and analytics play throughout the organization as the fuel that drives and shapes CDW’s priorities and serves as an accelerant for CDW’s progress.
The Sr. ML Data Engineer is a key player in the Data Science & Analytics team. This role will be responsible for data engineering, testing and management for the end to end ML and data pipeline including data products. This role will leverage CDW’s AI labs environment to enable the delivery in a common data lake and products.
Reporting to the Sr Manager AI Engineering & Architecture of Data Science and Analytics the Sr. ML Data Engineer must have data infrastructure, data engineering and Machine Learning skills, a proven track record of leading and scaling data pipelines in a cloud/on prem/big data environment, strong operational skills to drive efficiency and speed.
Key Areas of Responsibility
- Innovation: Continually research and evaluate emerging technologies.
- Innovation: Design and run experiments, research new methods, and find new ways of optimizing risk, profitability, and customer experience to consistently disrupt and deliver in a better way.
- Build and Ops: Develop and implement analytical solutions that the business (as your internal customer), external customers and partners can seamlessly integrate into decision making.
- Build and Ops: Ability to problem solve and constantly explore customer needs to drive towards impactful business outcomes.
- Communication & Strategic Management: Partner with IT and Commercial teams to execute the data science roadmap.
- Communication & Strategic Management: Serve as a central knowledge center for relevant data science tools and techniques.
- Communication & Strategic Management: Keep current with technical and industry developments.
- Communication & Strategic Management: Present insights and recommendations to audiences at the desired levels of understanding.
- Responsible for building and managing end-to-end data pipelines and operations from ingestion and integration through delivery for the data products.
- Build cross-functional relationships with Business Stakeholders, Architects, Data Scientists, Product Managers and IT to understand data needs and deliver on those needs.
- Drive the design, building and launching of new data models and data pipelines in production.
- Manage the development of data resources and support new product launches.
- Lead discussion of product-oriented analysis in meetings with clients and partners; comfortable speaking to executives.
- Primary data liaison for stakeholders to drive transformation and to democratize use of data.
- Sunset multiple redundant warehouses and marts with significant cost savings and support new integration and modernization.
- Consolidate the fragmented data across the company and provide simplified access to data for the stakeholders, internal users as well as external partners.
- Support compliance and auditing through a single gateway for data exchange.
- Stay abreast of technology development in retail and other industries.
- Work with multiple complex and disparate datasets to enable data delivery through various means and APIs to evaluate performance and amalgamate information to derive strategic insights and recommendations.
- Contribute and support the development of the overall data science and machine learning strategy and roadmap.
- Establish the core data foundation and common data lake to enable data driven decisions.
- Support delivery of scalable data products.
- Actively participate in the industry externally through internet research, white papers, or conferences.
Education and/or Experience Qualifications
- Bachelor’s degree in Computer Science, Information Systems or equivalent IT knowledge/experience.
- 5+ years of relevant work experience as a data scientist / Machine Learning expert.
- Experience working in Data engineering and ETL teams and on managing implementation projects that utilize big data, advanced analytics and machine learning technologies.
- Experience with agile software development methodologies.
- Ability to work with onshore and offshore resources.
- Distributed architecture and SaaS experience.
- Hands-on experience in building pipelines from variety of sources such as data warehouses and in-memory OLAP models, as well as experience in NoSQL/cloud.
- Strong understanding of data and information architecture, including experience with Big Data, Cloud, streaming and batch data processing.
- Strong experience building end-to-end data view with focus on integration.
- Ability to effectively present information, interact with, and respond to questions from managers, employees, customers, and vendors.
- Demonstrated experience in teaching and/or mentoring professionals.
- Passion to evangelize data science and engineering, teach others and learn new techniques.
- Expert Level -Data Exploration and ETL: Alteryx, TalenD, H2O, Informatica, Azure Data explorer, Azure Data Factory. etc.
- Expert Level - Experience with programming languages use (Spark, Python, R, Jupyter Notebooks, Java, Scala).
- Expert Level -Data Warehouse Solutions: Redshift, Snowflake, Postgres
- Expert Level -Workflow management: Airflow, Oozie, Azkaban
- Advanced Level -Cloud storage: S3, GCS
- Expert Level - Big Data technologies, Azure, AWS, Hadoop, Spark, Hive, Kafka, Flume, NoSQL stores (HBase, Cassandra, DynamoDB, MongoDB).
- Advanced Level – Github, Maven etc. – Modern code organizer and build process for about half of our applications
- Advanced Level – Expert at Jenkins – Modern build executor
- Advanced Level – Containers – Modern build with microservices
- Advanced Level – Swagger – Experience with modern features for the API including an automatically generated user interface
- Beginner Level -Data Visualization Solutions: MS Power BI, Looker, Tableau, Azure Streaming Analytics, Data Lake Analytics, Azure Time Series Insights, Azure Synapse Analytics
- Beginner Level -Distributed logging systems: Pulsar, Kinesis etc.
- Beginner Level -Data Science Workbenches: Cloudera, SAS etc.
- Bachelor’s degree in Business, Math, Engineering, Statistic, Economics, Operation Research, Data Science, Computer Science or related quantitate field.
- Experience working for consumer or business-facing digital brands.
CDW is committed to maintaining a workplace that is free of known hazards and to ensuring the safety, health, and well-being of coworkers and candidates for employment and their families, as well as the community.
CDW requires all coworkers be fully vaccinated against COVID-19, with the only exceptions being a documented, legally required medical or religious accommodation. Prior to starting with CDW, successful candidates will be required to: (i) be fully vaccinated against COVID-19 and provide CDW with proof of full vaccination; or (ii) apply for and receive a medical or religious-based accommodation to be exempt from the mandatory vaccination policy.