Abstract Attracting

Abstract
Attracting & recruiting right talent is a key differentiator in modern organisations. Recruitment process involves many data-driven, collaborative and knowledge intensive steps to ensure a right fit for an organisational talent requirement. Many of the current recruitment involves all the above-mentioned steps through a knowledge worker (recruiter or talent acquisition expert).
One of the key challenges in the recruitment process, is the analysis of the resume and making inferences on different named entities and comparing these between candidates. Recruiters have limited time evaluating candidates pertinent to a role. To reduce this gap of making decisions based on limited information/data, we propose a novel approach where the relevant information about a candidate is presented in a contextual way and in a consistent manner irrespective of the way the resume is articulated or written. Our approach helps on the challenges in two ways. One that the information relevant to a candidate is extracted and presented as a summary view and secondly its complemented with linked entity domain knowledge.
We adopt a typical scenario of recruitment challenges to find the right fit for Data Scientists role. In our scenario we extract the information from resumes and link this with an entity domain knowledgebase to generate relevance and possible score to differentiate between candidates for the role.
Keywords Big data analytics – Text analytics – Feature engineering

Introduction
Competing for high-quality resource talent is becoming a prevalent issue for almost all the organisational leaders (Polyhart 2006). At the same time the recruitment process is a very expensive process for organisations. Recruiting for wrong fit can prove exceptionally costly not only due to monetary perspective but also it has serious consequences on employee morale and productivity (Shulman & Chiang 2007; McGraw 2011)
For an effective recruitment plan an organisation needs to link its human resource strategy to its strategic business plan in conjunction with its vision and getting inputs from its key stakeholders (Compton, Morrissey and Nankervis 2009). Below is an illustration on the HR processes in line with organisation’s strategic business plan:

Source: (Compton, Morrissey and Nankervis 2009) Compton, R., Morrissey, William J., ; Nankervis, Alan R. (2014). Effective recruitment and selection practices / Robert-Leigh Compton, Bill Morrissey, Alan Nankervis. (6th ed.).
The recruitment process starts with the requisition step based on the organisational need for a specific skill against the strategic business plan. Identification of the business skill gap is crucial step to identify the required skills and then talent hunting on these skills takes place. The succeeding steps in the recruitment cycle involve more manual steps such as competency or position analysis, developing position description, posting on job boards, shortlisting the potential candidates based on the matching skills. The high-level recruitment process is highlighted below:

Adapted from (Compton, Morrissey and Nankervis 2009) Compton, R., Morrissey, William J., ; Nankervis, Alan R. (2014). Effective recruitment and selection practices / Robert-Leigh Compton, Bill Morrissey, Alan Nankervis. (6th ed.).
Background
Screening right candidates is not only time consuming, but it is also resource intensive, since it demands knowledge intensity to make the correct selection decisions. Recruiters spend on average 6.25 seconds on a resume before they make Yes/No decision (TheLadders.com). This is a very short time to make an informed decision on a candidate’s profile with the relevant business skillset gap. Moreover, there are only 6 elements which are mostly looked at by the recruiters. These are Name, Current title/company, Previous title/company, Previous position’s start/end dates, Current position’s start/end dates and Education (TheLadders.com).
By and large this creates a big challenge whether all the relevant information is properly structured in the resume for the recruiter to make the right decision in those 6 seconds. If the candidates have positioned their relevant information in different segments within the resume, chances are that the recruiters might miss this. Some of the recruitment techniques like the use of psychometric tests can potentially help (Melamed and Jackson, 1995) organisations can fine tune matching right candidates with the job. These techniques can used or abused by organisations in equal measure (Kwiatkowski, 2003) moreover these are costly and often overly used by untrained personnel through the availability of internet-based testing.

General purpose, Featurized, semantic and contextualized-item
Definition A Featurized-Item F I is a high-level entity and can be described with an attribute vector F Iid , Item, Feature-Set where, F Iid is a mandatory attribute whose value represents the unique identity of the Featurized-Item; Item is an Information-Item; and Feature ? Set is a set of features extracted from an Item, including:
– Lexical-based features. This category is related to the words or vocabulary of a language such as keyword, topic, phrase, abbreviation, special characters (e.g., ‘#’ in a tweet), slangs, informal language and spelling errors.
– Location-based features. This category is related to the mentions of locations in the schema of the item (e.g., in Universities the text may contain ‘Sydney’; a city in Australia )
– Natural-Language-based features. This category is related to entities that can be extracted by the analysis and synthesis of natural language (NL) and speech;

Definition A Semantic-Item SI is a high-level entity and can be described with an attribute vector F Iid , Enrichment-Set where, F Iid is the unique identity of the Featurized-Item; and Enrichment ? Set is a set of annotations used to enrich the features extracted from an I tem.We define a set of enrichment functions to enrich the extracted items. For example, if a tweet contains a keyword ‘Health’ (extracted using the ‘Keyword’ feature), the enrichment function ‘Synonym’ can be used to enrich this keyword with its synonyms (e.g., ‘well-being’ and ‘haleness’ using knowledge
sources such as Wikidata). The result (e.g., set of synonyms) will be stored in the Enrichment ? Set. The proposed enrichment functions include:
– Schema-based Semantics. We use knowledge services such as Google Cloud Platform,20 Alchemyapi,21 Microsoft Computer Vision API22 and Apache PredictionIO23 to extract various features from the social items properties. For example, if a tweet in Twitter contains an Image, it is possible to extract entities (e.g., people and objects) from the image.
– Lexical-based Semantics. We leverage knowledge sources such as dictionaries and WordNet,24 to enrich Lexical-based features with their Synonyms, Stems, Hypernyms25 Hyponyms26 and more.
– NL-based Semantics. We leverage knowledge sources such asWikiData, Google- KG27 and DBPedia28 to enrich Natural-Language-based features with similar and related entities. For example, ‘Malcolm Turnbull’29 is similar to ‘Tony Abbott’30 (they both acted as the prime minister of Australia) but ‘Malcolm Turnbull’ is related to ‘University of Sydney’31 (the University where he attended and graduated).
We also use techniques such as Coreference Resolution 9 to enrich named entities with their mentions. For example, ‘Malcolm Turnbull’ is a named entity of type person whose entity mentions include ‘Malcolm Bligh Turnbull’, ‘Malcolm B. Turnbull’, ‘M. Turnbull’, ’29th Prime Minister of Australia’ and more.
– Geo/Temporal-based Semantics. We leverage knowledge sources such as Wikidata and services (such as events and storyline mining) to enrich time-/location-based features with time and location events. For example, if a tweet posted from Australia we enrich it with all the events in that location. If a tweet posted on, for example, ‘3 May 2017’ we enrich it with all the events happening around that time frame. For example, if a tweet posted on ‘3 May 2017’ from any location within Australia, we enrich the tweet to be related to ‘Australian Budget’ as we know from knowledge sources that the Australian Treasurer handing the Budget on 3 May every year
The extracted data needs to be linked with domain knowledge to ascertain the full context. A domain knowledge base is populated by entities, relationships between entities and their semantics. (DataSynapse: A Social Data Curation Foundry)
Definition A Domain Knowledge DK is a knowledge base that consists of a set of concepts organized into a taxonomy, instances for each concept and relationships among the concepts. (DataSynapse: A Social Data Curation Foundry)
A high-level entity that will be defined by an attribute vector is termed as Contextualized-Item. An attribute vector looks like SIid , DK, Linking-Set where, SIid is the unique identity of the Semantic-Item; DK is the domain knowledge consists of a set of entities (where DKenti ty(X) is an entity in the Domain Knowledge, e.g., ‘Macquarie University’ which is an instance of the concept ‘Educational Institutions’); and Linking ? Set is a set of related pairs I temj , DKenti ty(Y ), where I temj is the j th extracted item in the Semantic-Item and DKenti ty(Y ) is an entity (uniquely identified as ‘Y’) in the domain knowledge. (DataSynapse: A Social Data Curation Foundry)

Linking contextualized-Item to the domain knowledge
Our approach builds the knowledgebase of the important entities which include organisations and their relevant information, universities their location and relevant information, countries and cities and relevant skills entities. Our algorithm shortlists the potential candidates based on the importance of the features that the knowledge worker applies i.e. Type/Specialisation of University, Location, Industry background, Skills etc.
Our algorithm addresses manual processing by introducing cognitive assistance to the HR knowledge workers to assess and quickly point outliers. It will also enable HR knowledge workers with the toolkit to extract knowledge from relevant talent documents and ask questions pertaining to individual’s background industry knowledge and their academic relevance to the role.
Below is an illustration on how our approach will extract important features from one side on the position description/business requirement and other side on the knowledgebase of the candidate’s resume.