
Bradley T. answered 01/15/23
Tutor for Python and High School and Middle School Math
To begin you can run the following simple regexp to get most words
For part a, they want titles to include the periods. Looking at the sentence we can define titles as words that start with a capital and end with a period. We don't want to include the period at the end of the sentence unless the last word is a title. First search for exactly one capital: [A-Z] Then for 0 or more lowercase [a-z]* then exactly one period \. . We can then use and or operator to include the rest of the words, which looks like this:
For part B, we can see the wasn't is split into wasn and t and skips the apostrophe. If we include an apostrophe as a character to search for along with the rest of the letters, the apostrophe would be included in the token. That looks like this:
Note: I copied and pasted the sentence here, and the apostrophe was not the common apostrophe (U+0027), but a right single quotation mark (U+2019). I replaced it with the regular apostrophe you type on the keyboard because I assume that is more in the spirit of the question
Note 2: ['A-Za-z]+\.*(?!$) is slightly simpler alternative solution without the OR(|) where we include the period only if we lookahead and it is not the end of the sentence. It would fail though if we end the sentence with a title. and it does not check if the first letter of a title is capitalized, so it would include the period in "st." for example. It works for that specific sentence though.