Zee A.

asked • 03/01/21

Dynamic Programming- Finding Optimal research and development policy.

A software manufacturer can be in one of two states. In state 1 their software sells well, and in state 2, the product sells poorly. While in state 1, the company can invest in development of upgraded version of the software, in which case the one-stage reward is 4 units, and the probability of degrading to state 2 is 0.2. If no investment in new development occurs, then the reward is 6 units, but the probability of transition to state 2 is 0.5. While in state 2, if the company invests in software development, then the reward is -2 units, but the probability of transition to state 1 is 0.7. Without special efforts to improve, the reward is 1 and the probability of upgrading to state 1 is 0.

Formulate a dynamic programming problem to determine an optimal research and development policy. Solve the problem for a time horizon of 12 time intervals.


1 Expert Answer

By:

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.