Using Natural Language Processing techniques for automated code refactoring
>Alan Barzilay_
>Alan Barzilay_
Introduction
Goals
Background
DataSet
Model
Results
Software is a form of human communication; software corpora have similar statistical properties to natural language corpora; and these properties can be exploited to build better software engineering tools.
“
Introduction
I believe that the time is ripe for significantly better documentation of
programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: ‘Literate Programming.’
Let us change our traditional attitude to the construction of programs:
Instead of imagining that our main task is to instruct a computer what to do,
let us concentrate rather on explaining to human beings what we want a computer to do.
The practitioner of literate programming can be regarded as an essayist,
whose main concern is with exposition and excellence of style. Such an
author, with thesaurus in hand, chooses the names of variables carefully and
explains what each variable means. He or she strives for a program that is
comprehensible because its concepts have been introduced in an order that
is best for human understanding, using a mixture of formal and informal
methods that reïnforce each other.
“
We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations [...]
Programming languages, in theory, are complex, flexible and powerful, but the programs that real people actually write are mostly simple and rather repetitive, and thus they have usefully predictable statistical properties that can be captured in statistical language models and leveraged for software engineering tasks.
“
Introduction
(=BA#9"=<;:3y7x54-21q/p-,+*)"!h%B0/.
~P<
<:(8&
66#"!~}|{zyxwvu
gJ%+[-->-[>>+>-----<<]<--<---]>-.>>>+.>>..+++[.>]<<<<.+++.------.<<-.>>>>+.Goals
function printOwing(invoice) {
printBanner();
outstanding = calculateOutstanding();
//print details
console.log(`name: invoice.customer`);
console.log(`amount: outstanding`);
}
Model
function printOwing(invoice) {
printBanner();
outstanding = calculateOutstanding();
//print details
console.log(`name: invoice.customer`);
console.log(`amount: outstanding`);
}
Goals
Goals
function printOwing(invoice) {
printBanner();
outstanding = calculateOutstanding();
//print details
console.log(`name: invoice.customer`);
console.log(`amount: outstanding`);
}
function printOwing(invoice) {
printBanner();
outstanding = calculateOutstanding();
printDetails(outstanding);
}
function printDetails(outstanding) {
console.log(`name: invoice.customer`);
console.log(`amount: outstanding`);
}LSP
Goals
Goals
DataSet
DataSet
{
"type":"Extract Method",
"description":"Extract Method private extractMijCommand(rulePos int, contents String) : List<String> extracted from private extractedRuleMij(contents String) : List<String> in class com.reason.bs.Ninja",
"leftSideLocations":[
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":106,
"endLine":120,
"startColumn":5,
"endColumn":6,
"codeElementType":"METHOD_DECLARATION",
"description":"source method declaration before extraction",
"codeElement":"private extractedRuleMij(contents String) : List<String>"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":109,
"endLine":109,
"startColumn":13,
"endColumn":70,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":111,
"endLine":111,
"startColumn":17,
"endColumn":72,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":112,
"endLine":112,
"startColumn":17,
"endColumn":91,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":113,
"endLine":113,
"startColumn":17,
"endColumn":54,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":114,
"endLine":114,
"startColumn":17,
"endColumn":251,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":115,
"endLine":115,
"startColumn":17,
"endColumn":42,
"codeElementType":"EXPRESSION_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":116,
"endLine":116,
"startColumn":17,
"endColumn":39,
"codeElementType":"RETURN_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":110,
"endLine":117,
"startColumn":33,
"endColumn":14,
"codeElementType":"BLOCK",
"description":"extracted code from source method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":110,
"endLine":117,
"startColumn":13,
"endColumn":14,
"codeElementType":"IF_STATEMENT",
"description":"extracted code from source method declaration",
"codeElement":"None"
}
],
"rightSideLocations":[
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":110,
"endLine":121,
"startColumn":5,
"endColumn":6,
"codeElementType":"METHOD_DECLARATION",
"description":"extracted method declaration",
"codeElement":"private extractMijCommand(rulePos int, contents String) : List<String>"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":111,
"endLine":111,
"startColumn":9,
"endColumn":63,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":113,
"endLine":113,
"startColumn":13,
"endColumn":68,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":114,
"endLine":114,
"startColumn":13,
"endColumn":87,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":115,
"endLine":115,
"startColumn":13,
"endColumn":50,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":116,
"endLine":116,
"startColumn":13,
"endColumn":247,
"codeElementType":"VARIABLE_DECLARATION_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":117,
"endLine":117,
"startColumn":13,
"endColumn":38,
"codeElementType":"EXPRESSION_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":118,
"endLine":118,
"startColumn":13,
"endColumn":35,
"codeElementType":"RETURN_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":112,
"endLine":119,
"startColumn":29,
"endColumn":10,
"codeElementType":"BLOCK",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":112,
"endLine":119,
"startColumn":9,
"endColumn":10,
"codeElementType":"IF_STATEMENT",
"description":"extracted code to extracted method declaration",
"codeElement":"None"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":140,
"endLine":146,
"startColumn":5,
"endColumn":6,
"codeElementType":"METHOD_DECLARATION",
"description":"source method declaration after extraction",
"codeElement":"private extractRuleMijDev(contents String) : List<String>"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":143,
"endLine":143,
"startColumn":20,
"endColumn":59,
"codeElementType":"METHOD_INVOCATION",
"description":"extracted method invocation",
"codeElement":"extractMijCommand(ruleMijPos,contents)"
},
{
"filePath":"src/com/reason/bs/Ninja.java",
"startLine":120,
"endLine":120,
"startColumn":9,
"endColumn":28,
"codeElementType":"RETURN_STATEMENT",
"description":"added statement in extracted method declaration",
"codeElement":"None"
}
]
}DataSet
Single refactoring?
Continouos?
Function extraction?
RefactoringMiner
Git cloning
DataSet
DataSet
Model
Start
Line
End
Line
Function
Model
Model
Model
Results
Results
Results
Results
Results
...And other additional unused slides
[Bahdanau et al. 2014]
Background
Background
[Vinyals et al. 2015]
Background
[Vinyals et al. 2015]
Background
You shall know a word by the company it keeps.
Firth (1957)
Background